Journal
of Econometrtcs
61 (1994) 273-303
Stochastic
frontier
A Bayesian
North-Holland
models
perspective*
Julien van den Broeck lJnr~r.r~f~~ of Antwq~. Antwrrp. B&w Gary Koop Boston
Unrcrr.trtr,
Bo.\ton.
MA
02215,
USA
Jacek Osiewalski
Mark F.J. Steel
Recetved
March
1992, final verston
recetved October
1992
A Bayestan approach to esttmatton, predtctton, and model compartson m composed error production models 1s presented. A broad range of dtstrtbuttons on the meffictency term define the contendmg models, whtch can either be treated separately or pooled. Postertor results are derived for the individual effictenctes as well as for the parameters, and the differences with the usual samplmg-theory approach are highlighted The required numertcal mtegrattons are handled by Monte Carlo methods wtth Importance Samphng, and an emptrtcal example illustrates the procedures. Key words Composed JEL class$catron:
error models; Efficiency, Model compartson,
Cl I; C15. D24
to: Jacek Ostewalskt, Department 27, 31-510 Krakow, Poland
Correspondence
Rakowtcka
Mixmg of models; Prior elicttatton
of Econometrics,
Academy
of Economtcs,
ul.
*We gratefully acknowledge sttmulatmg discusstons wtth Jacques H. D&e, Jean-Pterre Florens, Juhin de la Horra, Knox Lovell, Mtchel Mouchart, and Chrtsttan Rttter, wtthout tmphcatmg them. The first author received support from the Belgtan Federatton for Sctentific Research (NFWO, no. S2/5-ND.E43), the second and third authors acknowledge the hospttahty of the Center for Economic Research and of Ttlburg Universtty, whereas the fourth author had a fellowshtp from the Royal Netherlands Academy of Arts and Sctences (KNAW)
0304-4076/94/$07.00
0
1994--Elsevter
Sctence B.V All rtghts reserved
274
J. van den Broeck
et al.. Stochastic
frontter
models
1. Introduction The use of frontier models, with the associated evaluation of efficiency of the economic units involved, has a relatively long-standing tradition in econometrics. We refer the reader to Bauer (1990) for a survey of the literature. The production or cost frontier itself is not supposed to be known exactly to the econometrician and, generally, some symmetric error is attached to it. A second type of error is introduced to represent deviations of the individual units from the frontier (i.e., inefficiency), and is thus, by definition, of a one-sided nature. This so called composed error framework was first introduced in Meeusen and van den Broeck (1977) and Aigner, Lovell, and Schmidt (1977), and, as Bauer (1990, p. 41) remarks, ‘the basic set of econometric estimation techniques has changed relatively little in recent years’. In fact, to the knowledge of the authors, the entire literature has been embedded in the sampling theory paradigm. Here we reconsider the basic composed error model with different sampling distributions on the efficiency term, the exponential of Meeusen and van den Broeck (1977), the half-Normal of Aigner, Lovell, and Schmidt (1977), the truncated Normal proposed by Stevenson (1980), and a gamma based on Greene (1990) in a formal Bayesian framework. The costs associated with this shift of paradigm are mainly of a computational nature, but the gains are shown to be worth the effort. We formally take parameter uncertainty into account in deriving posterior densities for the efficiencies. In addition, we can easily treat our uncertainty concerning which sampling model to use by mixing over them with posterior model probabilities as weights. This pooling approach is quite natural in a Bayesian analysis. Given a particular model, we assume that all efficiencies are drawn from the same simple distribution, only we do not know which one. We avoid mixing different distributions for each individual efficiency within the sampling model, as this would render the analysis intractable. Rather, we deal with simple tractable models and mix over them at the final stage. In case we wish to choose a particular distribution for the efficiency term, we can use Bayes factors or posterior odds as a criterion for model selection. We propose prior structures on the model-specific parameters 6, that do not affect the salient features of the different sampling models and derive a simple procedure for prior elicitation, based on only the prior median efficiency. From an economic point of view, the need to survive in a competitive environment of most economic units induces a belief that many of them are close to the frontier, i.e., full efficiency. However, given the dynamic character of competition itself, strategic policies in the long run (secular inefficiency) could keep units away from their frontier. In many cases, this will be compounded with organisatorial inefficiency in the short run [see van den Broeck et al. (1991)]. Although these economic considerations guide us in forming our prior ideas
J van den Broeck et al.. Stochastic fiont~er models
275
concerning efficiency, they do not provide us with exact functional forms for the distribution of efficiencies. The latter uncertainty will here be resolved by mixing over a number of contending efficiency distributions. An empirical application on the data set of Greene (1990) leads us to address some numerical issues, which might have influenced Greene’s results. We analyse the gamma models under a diffuse improper prior on the parameter of the efficiency sampling distribution, and use our proper prior structures on 6, to compare them with the truncated Normal model and the model which imposes full efficiency.
2. Model framework We consider Yl =
a Bayesian
h(x13B) +
VI -
approach
to stochastic
frontier
models
zi,
of the form
(1)
where y, denotes the log of the output variable (or the negative of log cost) for firm i (i = 1, . , N), x, denotes the column vector of logs of exogenous variables, vi is a symmetric disturbance capturing measurement error of the stochastic frontier,
jz= h(x,,B) + L’,, and z, is a nonnegative disturbance modelling the level of inefficiency. u, and z, are independent of each other and across firms. This is the composed error model framework proposed by Meeusen and van den Broeck (1977) and Aigner, Lovell, and Schmidt (1977). Any parametric Bayesian approach requires full specification of the likelihood function. In order to meet this requirement we assume that h( ., .) is known and is a measurable function of /I E B 5 Rk, and that vi is Normally distributed with mean 0 and variance c?.’ However, as far as z, is concerned, we shall consider several one-sided distributions, leading to different (competing) statistical models. By attaching prior probabilities to these models, our Bayesian approach naturally leads to then posterior probabilities, that tell us which of the competing models is most favoured by the data and also enable us to pool posterior inferences (about quantities of interest) based on different distributional assumptions about z,. Rather than artificially imposing a specific form for the density function of z,, we average out model uncertainty. Since the exponential distribution of Zi, proposed by Meeusen and van den Broeck (1977) has a considerable theoretical appeal, we will assume it in our
‘As we show III appendix A, the assumption spheruty,
of Normahty of c’,‘s can be replaced and all our results (except those relatmg to a’) hold
by their jomt
276
J. van den Broeck et al , Stochastrc frontier models
model 1, Mi. However, the zero mode of the exponential distribution (i.e., a gamma distribution with shape parameter 1 and unknown scale parameter 2) seemed to be too strict a restriction for some authors. Stevenson (1980) considered gamma distributed z, with shape parameter 2 or 3 as well, and Beckers and Hammond (1987) and Greene (1990) have recently presented maximum likelihood estimation for the case when Zi is gamma distributed with unknown shape parameter P E R, . In his application Greene (1990) finds p = 2.45 ( f 1.10) as the ML estimate of P. In the case of noninteger P, the ML approach requires the evaluation of integrals which have no closed form solution and for which there are no polynomial approximations available. Note that considering a gamma distribution with high P makes the shapes of the densities of u, and Zi hardly distinguishable. Here, in order to clearly distinguish between Zi and u,, and make computations relatively simple,’ we adopt gamma distributions with fixed small integer values of the shape parameter as competing hypotheses. Therefore, under Mj, P,(Z,lQ,)
=fc(Zilj,~-‘)
=
exp( -
&fjC’
K’Zi)l(Zi
2
0),
where 8, E 0, generally denotes the parameters in Mj, I(.) is the indicator function, and A > 0 is one of the parameters in 0,, j = 1,2, 3. Note that for j = 2 the shape of the density function of z, is already completely different from that in the exponential case (j = l), and thus we will restrict ourselves to j = 1,2, 3. In the statistical literature, such distributions are referred to as Erlang distributions [see Johnson and Kotz (1970, p. 166)]. As an additional ‘baseline’ model (MO) we will consider the standard (‘average practice’) model which does not allow for inefficiency considerations and formally corresponds to Zi = 0. We will also consider, as Ma, a two-parameter truncated Normal distribution z,, proposed by Stevenson (1980). The baseline model, MO, is the simplest case without composed error, i.e., where all z,‘s are zero and thus all firms are on the production frontier, which is itself stochastic. We stress that this model is clearly not meant to describe reality, but we treat it as a benchmark against which the improvements of our various composed error models will be evaluated. The model will thus correspond to the following data density:3
which will be combined with a prior density p,,(B,,) = po(p, 02) in the usual fashion to obtain Bayesian posterior and predictive results. ‘Note that our Bayesian approach space even m the simplest case (M,) 3Appendlx
B describes
the density
reqmres functions
Monte
Carlo Integration
used m this paper.
over the whole parameter
Since we assume independence generally given by
j,(e,IYax)
= PAYI
over firms, the likelihood
x3 fj,) =
=
under M, is
]“I P,(Y,IXi>Q,),
(3)
r=1
wherey = (yr, . , yN)’ and X is the matrix (x1, the posterior density will be given by P.t(8jlY>x)
function
, xN)‘. Under a prior p,( e,),
~~(fljlY,X),
K,elPj(Q,)
(4)
where we have defined
K, = P~(YIX)= h@,) l,(H,Iy>X)dQ,.
(5)
@J
In the special case of, e.g., a Cobb-Douglas production technology, which implies a linear h( x,, /I), and a natural-conjugate or Jeffreys’ diffuse prior for U,, we can analytically evaluate posterior and predictive densities under MO. This, however, will not be the case for the composed error models M,, j = 1, 2, 3, 4.
3. Posterior and predictive results under Erlang inefficiency term If MJ, j E { 1,2,3}, is the model considered, and z,, given x, and the parameters BJ, is Pj(Yt~zJx~~o~)
=fh(Y,lh(X,,P)
where the parameters are = (p, cr2, A) E 0, G Rk x R+ factorization is of interest. It density of z, given (y,, x,, 0,)
-
then the joint
distribution
z,,~2)fc(z,lj,i-1),
of y,
(6)
not model-specific, i.e., for j = 1,2,3, 0, = 0, x R+ For the purpose of inference, the opposite can be easily deduced from (6) that the conditional is
= (s>Izi-‘flt(z,Im,,02)I(Z, -1
PJ(ZllYL,%,~,,
w,;’ @ [
2
01,
where
m, = m(y,, xL, @+)= h(x,,P) - yI - c2/1,
(8)
@(.) denotes the distribution function of N(O,l), and wj, = w,(yi,xl,O*) is the appropriate integrating constant of (7). Remark that if j = 1, then (7) is just
278
J. van den Broeck et al., Stochastic
frontier
models
a truncated Normal distribution and win = 1. This result was obtained by Jondrow et al. (1982). Forj = 2, 3, wzl and w3, are, respectively, the first- and the second-order moments of that truncated Normal distribution, and therefore
w3,
=
a2
+
Let c,! = wjl@(mi/o). written as
m,wz,.
(10)
NOW,
from (6) and (7) the sampling
density
of yi can be
(11) Note that (11) is equivalent to Greene’s (1990) general formula (26). However, the correct forms of the data densities for the special cases with j = 2 and j = 3 have not been presented in the literature. Although in the exponential case (j = 1) our (11) is equivalent to Stevenson’s (1980) formula (20), his (21) and (22) for two other Erlang cases cannot be deduced from our (11) and (9HlO). The likelihood function under M, (j = 1,2,3) is a product of the densities in (11): b(UY>X)
= l,(Q*IY3X)
(12) and under any prior density pj( ~,) = pj( e*) the joint posterior density, obtained as in (4) and (5), is too complicated for analytical integration, and thus posterior moments and marginal densities will be calculated by (k + 2)-dimensional Monte Carlo integration with importance sampling. The posterior distribution summarizes all the information about the parameters which is contained in the prior and in the observed data. In frontier analysis, however, the main interest is not in the parameters themselves, but in the individual efficiency (of the firm i, i E { 1, . . , IV}) measured by ri = exp( - Zi).4 The conditional posterior distribution of z, given yi, x,, and the parameters, presented in (7) exactly corresponds to Greene’s (1990) conditional distribution (27) and constitutes the common starting point for inference about individual (in)efficiencies in both sampling theory and Bayesian approaches to stochastic frontier models. The difference lies in the treatment of the unknown parameters in (7). The sampling theory approach of Jondrow et al. (1982) and 40ur
model (1) 1s formulated
m terms of logs of variables
J van den Brorck et al, Stochasrrc frontrer models
279
Greene (1990) amounts to conditioning on 4, an estimate of d,, whereas our Bayesian approach naturally averages out uncertainty about 0, by marginalizing (7) w.r.t. the posterior density of the parameters. This leads to the following posterior densities and moments:
~Azil~,x) = Sp,(z,Iy,,x,,~,)p,(u,ly,X)d~,,
(13)
Q,
p,(r,Iy,X) = r,L’p,(z,l~,X),
rr E (0, 11,
(14)
which summarize the evidence on the individual within-sample efficiency under M,. In (14) pJ(Zily,X) is evaluated at z, = - lnr,. In (15) g(z,) is equal to zf or r: for obtaining posterior moments, and (k + 3)-dimensional integration over 0, = (p,a’,/Z) and z, will be performed by Monte Carlo. Note that in the composed error case linearity of h(x,,B) no longer leads to analytical results. The dimension of fi, k, is what really matters in our numerical integration. Apart from the within-sample posterior analysis, some predictive analysis of an out-of-sample (maybe hypothetical) firm will usually be of interest. Assuming that (1) and (6) hold for some i > N, maintaining independence across all the firms under consideration, and using f to index the unobserved (forecasted) firm, we obtain
P,(Y~IXS>Y>X)= SP,(yflxf,Bj)P,(O,lY,X)dC),, QJ
(16)
P,(z~IY,W
(17)
= ~.~~(z,l~~~-‘)p,(~ly.~)d~~ 0
as the predictive densities of the log of the actual output (ys) and its inefficiency component (zs), respectively. In (16) p,(y,lxf, Q,) is the sampling density (of y, given exogenous xf) of the same form as in (11) but with i =f; whereas in (17) pj(/lly, X) is the univariate marginal posterior density calculated from P,(Q,lY>X). Note that (13) and (17) differ substantially in their construction and terpretation. (17) is posterior to the data on all the observed firms, but prior the (yet unobserved) output of the firm 5 and therefore can be treated a Bayesian counterpart of the classical characteristics of ‘average’ inefficiency opposed to individual inefficiency). Again, we marginalize w.r.t. 2, using posterior density, whereas the standard non-Bayesian practice would be
into as (as its to
replace ,I by its estimate, 1. Useful summary measures of the ‘average’ efficiency are given by the first- and second-order moments of rr = exp( - zf) given y, X. Since, for q > 0, = (1 + qlti)-‘,
yexp( - qz)f,(z(j,i-‘)dz 0
therefore,
for j = 1,2,3,
E,(r;lY,X)
=
j,l +qn)-‘p,(iIy,X)di.
(18)
0
More directly interpretable posterior characteristics of both the individual and ‘average’ efficiency will be given by the quantiles (e.g., median) of pJ(rl 1y, X), i E { 1, . . . , N}, and pj(VJy,X),f#{ 1, . , N}, respectively.
4. Truncated Normal distribution of the inefficiency term Let us now present briefly the Bayesian analysis of the stochastic frontier model (1) with zi distributed as N(p, w’) truncated at 0, and with p E R and both unknown. This model, denoted by M4 here, was proposed by WER+, Stevenson (1980) who generalized the original (half-Normal) model of Aigner et al. (1977) where zi + IN (0, 02)1. Reparameterizing (p, o) -+ ($, w) with $ = p/w will prove useful. $ indicates how many standard deviations the truncation point 0 is from the mean of the underlying Normal distribution. Under M4 the joint distribution of the observed yi and the unobservable z,, given exogenous x,‘s and the parameters f14 = (/I, 02, $, w) E O4 c Rki3, takes the form
which can be rearranged
P4(Yi7zilx,te4)
as
=.f~(Yilht
- Iclw3w2 + 02) O”~W - O’(yl - hi)
(1
xfh z, where h, = h( xl, /I). Integration of yi under M4, i.e.,
’ o.12 + CT2 >
@($I ’
over z, E [ 0, + cc ) leads to the sampling
P4(Yilxi9O4)= C@($)l-‘@ xfh(Yilht
co2+ CT2
02~02 l(Zi 2 0)
a’$ [
4Yl
- h)
a&T2
- Il/m,W’
+ 02),
density
1 (20)
which is in agreement with formula (5) of Stevenson conditional density of z, given y,, x, and the parameters
(1980). Therefore is
cT2$hJ- 02(y, - h,) 02w2 ___ I(z, co2+ CT2 ’ w2 + CT2
(I
xf,:
z,
1
2
the
O),
(21)
a truncated Normal density, which for $ = 0, i.e., for the half-Normal distribution of z, given the parameters, reduces to the density obtained in Jondrow et al. (1982, App.). Note that, since - z, can be interpreted as the unknown negative mean of the Normal error y, - h(x,,fl), therefore, apart from truncation, the conditional posterior of z, (given the parameters) shares the well-known properties of the posterior of the Normal mean under a Normal prior. This untruncated posterior of z, is Normal, its precision, equal to (w2 + c?)/cJ~w~, is a sum of the ‘sampling precision’ (l/a’) and the ‘prior precision’ (l/o’), and its mean is a weighted average of the ‘prior mean’ (p = $w) and the ‘observation’ (hi - y,) with the weights equal to the proportions of the prior and sampling precisions in the total (posterior) precision. The Bayesian analysis goes along the same lines as under Mi , M2 or M3, i.e. we use (3), (4) and (5) for j = 4. Inferences about individual efficiencies of the observed firms will be based on the marginal posterior densities of Y,‘S(i = 1, . . , N), i.e.,
where P4(zily,,X,,04), the density in (21) should be evaluated at z, = - ln(r,). The posterior density and moments of the efficiency r, of a yet unobserved (or ‘average’) firm f will be calculated from
P~~JIY,X) = li’
J 1 C@(rC/)I-‘.fi(
and
JUr‘j-ly,X)
= E4(e-q’11y,X)
- lnrflrClw~2)
282
J. uan den Broeck et al., Stochastic frontlet- models
where p4( $, w 1y, X ) is the posterior density of ($, w), obtained as a marginal density of p4(04)y, X). Under M4, the Bayesian analysis will require (k + 3)dimensional Monte Carlo integration for calculating posterior results on the parameters and (k + 4)-dimensional integration for obtaining both individual and ‘average’ posterior characteristics of (in)efficiency.
5. Comparing
models and pooling inferences
The Bayesian approach to comparing posterior odds ratio, i.e., p(“JIY,x)
alternative
_ p(“j).P(Ylx,Mj)
P(M) P(YlX, Ml)’
P(MllY>X)
models is often based on the
P-4
where P(M,.(y, X) is the posterior probability of the rth model, P(M,) is its prior probability, and p(y(X, M,.) = p,(ylX) is the marginal data density (under R/r,) obtained by integrating out the parameters. (22) can be formulated simply as follows: posterior odds are equal to prior odds times the Bayes factor. In the case of the comparison between any pair of the models considered previously, the Bayes factor is the ratio of the normalizing constants defined in (5), i.e.,
B,l
=
PAYIX) ___=_
KJ
PdYlX)
The posterior
G’
probability
p(“jlY,x)
=
of the jth model is
KjP(Mj)
I
C KIP(MI). I
(23)
Specification of the prior distributions of the parameters of the models being compared deserves much caution. For estimation purposes within individual models improper priors are often used, provided that they lead to well-defined posteriors (i.e., Kj < + co ). However, the use of an improper prior results in the dependence of K, = p(ylX, Mj) on an arbitrary positive constant. If the same improper prior is defined on the parameters which are common to all compared models, then the same constant appears in all K,‘s, and it cancels out when we calculate Bayes factors and posterior model probabilities. An improper prior defined over model-specific parameters, however, leaves posterior model probabilities undefined. Thus, if we are going to compare only stochastic frontier models with different Erlang distributions of the inefficiency term, i.e., models Mi, M2, and Ma, we can use pj( ej) = p( d,), j = 1,2,3, with p( 0,) improper over all the parameters, since they are all common. If, however, we also consider MO,
J van den Broeck et ul
, Stochastrc frontier models
283
then ,I becomes specific to M, with j = 1,2,3, and in the prior structure p,(B,) = p(/?, a’)p,(Llp, a2) the last factor needs to be a proper p.d.f. Usually the product structure pI (0,) = p(j3, 02)p,( 2) with proper p,(l) and possibly improper p(&02) will be assumed. Here we have implicitly assumed that p,(/?, c2) = p(p, 02), j = 0, 1,2,3, i.e., the prior on the stochastic frontier parameters does not vary over the models under consideration. Analogously, if we want to compare M,, Ml, M2, M3 with n/r,, we can assume p,(6),) = p( 8, c2) p4( II/, W) with p(fi, a2), the same as for the other models (and possibly improper), but p4( $, o) necessarily proper. Having calculated the posterior model probabilities, we do not need to choose among the competing models. For inference purposes we can average out specification uncertainty by mixing individual posterior densities of a quantity of interest, say p, into its overall posterior density,
P(PIY,-~)
= ~W~,IY~)P,(PIYJ).
(24)
Note that this pooling approach amounts to assuming that the conditional distribution of the inefficiency term, z,, is simple (e.g., truncated Normal or exponential), only unknown. Rather than mixing different conditional distributions for each z,, which would lead to a complicated sampling model, we mix over simple models differing in the distribution of all the inefficiency terms. Therefore, each model is in itself tractable, and the mixing is trivially performed at the final stage, as in (24).
6. Efficiency distributions
and prior elicitation
for model-specific
parameters
In the previous sections we have considered for each model M, (j = 1,2,3,4) different conditional distributions of the inefficiency term (z,), which, after taking the Jacobian r,-’ = exp(z,) into account, lead to the following distributions of the efficiency r, = exp( - z,):
(1) p,(ri ItI,) = p,( ril~j), the conditional depending
(2) p,(r,Iy,X,
prior distribution given the parameters, on the model-specific parameters 6, E Aj only, 0,) = p,(rl(y,,xz, 0,) for i E { 1, . . . , N}, the conditional posterior
distribution, Pj(r,ly,X) for i E { 1, . . , N}, the marginal (3) p,(rsly, X) forf$ { 1, . , N}, the predictive (4) tion of the ‘representative firm’ efficiency). The last two distributions they combine all information
posterior distribution, distribution (posterior
distribu-
are of direct interest on the inference stage, since on the individual or ‘average’ efficiency which is
284
J. van den Broeck ef al., Stochasrrc fionfler models
contained both in the prior specification and the given set of data. The influence of the observed data can be assessed by the comparison between these two posterior distributions (unconditional w.r.t. the parameters) and the marginal prior distribution of the efficiency given by
and the last density is evaluated at Zi = - ln(r,), li E (0,11. Note that the marginal prior distribution of the efficiency, which should reflect the economic interpretation of ri and possibly subjective beliefs of a researcher, is a useful starting point for eliciting the (hyperparameters of the) prior distribution of the model-specific parameters, 6j (j = 1,2,3,4). The classes of prior densities ~~(6,) will be simple, allowing straightforward analytical derivation of the marginal densities pj(zi) and pj(r,), and the elicitation of their hyperparameters will be based on quantiles. If there are two free hyperparameters in p,(r,), or equivalently in pJ(zi), then two quantiles are sufficient, and the following questions could be asked: what is the prior probability that a firm (in the industry under consideration) is more than (e.g.) 90% efficient and less than (e.g.) 50% efficient? Eliciting more quantiles would lead to more equations than hyperparameters, and therefore some optimization technique should be used. An alternative strategy, which will be followed here, is to fix all hyperparameters but one at convenient values, and then elicit a value for the remaining hyperparameter through the median only. For practitioners, the median seems an easily elicited quantity as a result of its intuitive definition and its appealing simplicity. In the case of Mi , M2, and MS, the only model-specific parameter is i (6, = 3,, j = 1,2,3), the scale parameter of the Erlang distributed inefficiency term. Here an obvious choice is an inverted gamma prior distribution of A, or equivalently a gamma prior of I- ‘, which is natural conjugate and gives the diffuse prior p(i) oc i-’ as the limiting case. Combining pj(zJi.) =fc(z,lj,i-i) with p,(A- ’ ) = fG(A- ’ 1c0/2, &/2) leads to the following three-parameter invertedbeta marginal prior of z,:
which, in turn, gives for r, E (0, 11: P,(ri)
=
ri ‘&I( - lnr,li0/2~j,&P)
as the prior density of the efficiency itself. This last density is not a standard one, but the quantile elicitation can be done in terms of u, = (1 + 2z,/&,-’ = (1 - 21nr,/&‘, which is beta distributed with parameters (iOj2,j). We have Pr{r, < r*IMJ)
= Pr{u, < (1 - 21nr*/i,))‘~M,J
where a, = (1 - 2 In r*/&‘, and the integral is the incomplete beta function evaluated at a,. In order to further simplify our prior elicitation we propose to take lo = 2j. Then U, has a symmetric beta distribution, and thus its median is i, i.e., Pr ( ui < )} = 3. Now the whole elicitation procedure reduces to specifying the prior median of the efficiency, i.e., such a value r* E (0, 1) that Pr{r, < r*) M,j = f. Since it corresponds to a, = 4, therefore & is given by the equation 3 = (1 - 2 In r*/i,)‘, i.e., E., = - 2 In r*. Summarizing, we propose as a prior on the model-specific parameter j+- ’ in M, (j = l,2,3) p,(L’)
=.fG(3,-‘1j,
where r* is the prior median and r, are
- lnr*), of the efficiency
(25) r,.
The implied marginal
priors of z,
P,(zJ = he(z,ljJ, - In r*)
(26)
p,(r,) = r;‘,fid - In r,lj,j, - lnr*),
(27)
and
respectively. In practice, it seems preferable to choose r* independently of the mode1 entertained, i.e., Pr { Y, < r* IM,} = Pr { r, < r*} = ), so that the model dependence in the prior is confined to the value of co = 2j. This simple way of prior elicitation preserves in a natural way the differences that exist between the models conditionally upon the parameters. From a gamma (Erlang) distribution with shape parameter j for p,(z,(;l) we are led to a marginal prior pj(z,) of the inverted-beta form with essentially the same behaviour, but thicker tails, reflecting the prior uncertainty concerning A. The proposed elicitation rule is thus not only very simple and easy to use in practice, but also avoids distorting the main characteristics of our different Erlang models. In the case of M4, there are two model-specific parameters, namely $ and CO, the parameters of the truncated Normal distribution of z, given 04. Here, due to
286
J. van den Broeck
et al
, Stochatic
frontier models
truncation, the usual Normal-inverted gamma prior specification for (p,o*) (with p = $o), natural-conjugate in the untruncated case, does not lead to an analytically tractable marginal prior of zi. Therefore we propose the following modified version of the usual natural-conjugate prior:
or, in terms of $,
and
PAW-*)=fc(~~-21vo/2, b/2), where c, 0 -2. The values of the right
(29)
is the normalizing constant of the conditional priors of p and $ given factor @($) makes the conditional prior in (28) asymmetric (positive $ are more probable a priori than negative ones) and shifts its mean to of 0:
Although this prior specification clearly favours (as conditional distributions of z, given 0 2 and $) distributions with $ > 0, i.e., with truncation below p = $0, there is enough prior mass in the negative tail of II/ to lead to the half-Normal distribution of z, given me2 and the half-Student marginal distribution of z,:
P4(zJ0m2) = 2f,!q(ZilO,(l+ U)OJ2)Z(Zi 2 O), P‘t(Zi) =
2f;(z,IvoAvoNu + a)b))l(z,
2
Oh
and it is obvious that c, = 2. The latter fact implies that $ and o-* independent and thus
(30) are a priori
0 < E(II/) < $&.
From the density of p4(z,l 6 ‘) we immediately deduce that the difference between the truncated Normal model with unknown $ (a > 0) and the halfNormal (a = 0) is exactly the variance inflation. Note that in fact there are only two quantities we can elicit from (30), namely v0 and 26 = (b/v,)(l + a), where a and b are indistinguishable, and the elicitation can be based on Student t quantiles. If u is distributed as the untruncated univariate Student t with v.
.I oan den Broeck et crl Stochastic
degrees of freedom, Pr(r,
location
< r*IM4}
fiimtw
modc4
281
0, and scale 1, then for r* E (0,11,
= Pr{z,
> - lnr*IMq}
= 2[1 - Prju
I
= 2Prju
> - lnr*/z,(M,}.
- lnr*/7,1M4}],
and the median of r, corresponds to the 0.75 quantile of U, no 75(v0). This quantile is 1 for v0 = 1,0.816 for v0 = 2, and tends to 0.674 as v0 approaches infinity. Eliciting r* equal to the prior median of the efficiency leads to the equation ~e,,~( vO) = - In r*/r,, i.e., r0 = - lnr*/u,.,,(
\lO).
(31)
Since u~.,~(v~) does not change much for v0 > 5, we propose to fix v0 = 10 which gives r. = - yin r* or 7; zz 21n2r*. This corresponds to the following prior density of the model-specific parameters of M4:
P~(ICI,~~~) = 2~(~)f~(~10,~)fc(w-215,(10/(1 + a))ln2r*), and results in the following
marginal
prior densities
p4(z,) = 2f~(z,(10,0,(21n2r*)~1)I(zi
p4(r,) = ffi( I
- lnr,)10,0,(21n2r*)~1)I(0
(32)
of zr and r,:
> 0),
(33)
< ri 5 l),
(34)
which, as a result of our elicitation based on (30), no longer depend on a. As in the Erlang models, the marginal inefficiency distribution has thicker tails than the one given 02, and the influence of integrating out $ with the prior in (32) is to relocate the mode at the origin and to inflate the variance. Again, prior elicitation rules are very simple and the prior densities do not affect the salient features of the model. Although the marginal prior densities of zi and r, do not involve a any longer, posterior and predictive results will, of course, be affected by the value of a. Rather than complicate our elicitation procedure, we could conduct a sensitivity analysis with different values of a, one of which would be a = 0, corresponding to the half-Normal case of Aigner et al. (1977). As a grows, the relative importance of $ in achieving the (fixed) stochastics of z, in (33) becomes larger, and the relative influence of the stochastic nature of w2 decreases. Values for a can be calibrated against the fact that u approximates the prior variance of $.
288
J uan den Broeck et al
7. Empirical implementation
Stochactrc frontier models
to the US electric utility industry
As an illustration, let us consider the data collected by Christensen and Greene (1976) for 123 electric utility companies in the US in 1970. The numbers are listed in the appendix to Greene (1990). Both Christensen and Greene (1976) and Greene (1990) have fitted a cost function suggested by Nerlove (1963) and based on the Cobb-Douglas production function, but generalized to include a term in squared log of output Q, which permits returns to scale to vary with Q. There are three production factors - labour, capital, and fuel ~ with respective prices P1, P,, and P,-, and the specification of the cost function is Yi=
-PO -PllnQI
-B21n2Qz
-P31n(Pt/Pf),
-P41n(PkIPf)l + u, -z,,
(35)
where y, = - In (cost firm i/Pf,).
7. I. Erlang models under diffuse priors In this subsection, we shall only report results on the Erlang models (j = 1,2,3), which allows us to use independent diffuse prior densities on all parameters, since I. is common to all three models. However, as stressed by Ritter and Simar (1993) the usual Jeffreys-type prior on i, p(i) ac iL- ‘, leads to improper posteriors due to the behaviour of this prior close to 2 = 0. Therefore, we bound i strictly away from zero by some c: > 0, which we choose equal to the machine precision. We do not, of course. advocate this as a useful prior density for practical applications, but it provides us with a benchmark case, which is arbitrarily close to that under the ‘obvious’ improper prior of A. Thus, under the prior density
p,(O,) = p(fi,ff2,d) = cY2ib-‘l(A
> e),
with c > 0, we obtain the posterior moments as stated in table 1. These results have required seven-dimensional integration using Monte Carlo with importance sampling. As importance functions we have used a Student t for fi with 10 degrees of freedom and inverted gamma densities on cr2 and 1. Starting from initial importance functions based on, e.g., the mode and the Hessian of the likelihood, we iteratively update the hyperparameters of the importance functions in preliminary runs. Final results are based on 150,000 antithetic drawings for j = 1,2 and on 100,000 antithetic replications for j = 3. The last lines in table 1 contain values for the relative importance of the symmetric component in the posterior out-of-sample error variance, calculated as VR, = var( ufl y, X )/ var(u, - zsly, X), as well as for 7V, = var( us - zrl y, X).
Posterior
moments
Table I of 0* with
Mean -
(0.3388)
;;
00.4272 0295
;: (r=
i
’
VRI T V,
)
7 442
(0 3427)
(0.0028) (0.0424)
0 0304 0.4071
0.0489 0.2480
(00610) (0 0643)
00133
(0.0038)
00912 I I.901
(0.0246) (3 733)
0 0500 23 446
0 5825 0 0228
-
J=3 (s.d
Mean
(s d.)
7467
i.
prior
/=2
/=I
Bo
diffuse
Mean 7361
(0 3647)
(0 0028)I ) (0.042
00311 0.3968
(0.0027) (0.0389)
00.0603 2572
(0 (0.0684) 0638)
00.0698 2419
(0.0587) (0.0678)
00161
(0.0043)
0.0171
(0 0036)
(0.0168) (I 1 368)
0.0345 32 830
0.7059 0.0228
-
(s.d.)
(0.01 16) (12.246)
0.7669 0 0223
Just like in Nerlove (1963) (without composed error) and in Greene (1990) returns to scale tend to go down with output5 and results for p under this diffuse prior are not very different from Greene’s (1990) results. Of course, his ML results are more comparable to the posterior mode than to the posterior mean, and both differ as a result of the fundamental lack of symmetry of the posterior density (and in particular the likelihood function). As a result of the structure of the likelihood function in (12), it is extremely complicated to evaluate correctly for cases where some m,‘s in (8) are large negative numbers. Indeed, (12) involves a product over all firms of c,,‘s which then contain some very small numbers, emanating from the evaluation of the Normal distribution function for large negative arguments. In particular, we found the standard routine CDFN for cumulative Normal density functions in GAUSS to be entirely useless’ for evaluating @(.) with arguments less than - 5, which do occur frequently in our computations, especially forj = 2, 3. Mechanical use of CDFN leads one very far astray in evaluating (12), as it appears that the likelihood shoots off to infinity for CT’-+ 0 (which blows up the arguments of the Normal distribution function). Extensive checking of such routines against the tables in Pearson and Hartley (1954) has led us to use CDFN for arguments > - 5, and in other cases we use the continued fraction expansion ‘Returns to scale posltlve regon
are given
by
ll(p,
+ 2/121nQ,)
and
/IL has most
of the posterior
mass
m the
‘In fact, It gives exactly the ,arne value to G(u) for any s 5 - 5 Whereas this IS accurate for Y = - 5, it IS of course far too large for x appreciably smaller than - 5. Note that this .serrous/~~ blows up values for the hkehhood function m (12) See also GoldsteIn (1991) Neither CDFN nor any of the two polynomial approxlmatlons ltsted in Abramowltz and Stegun (1964) and brlefly mentloned by Greene (1990, p, 148) were found to be of any use for this apphcatlon.
290
J. L’UNden Brorck
frontrev
et al., StorhaJtrc
models
[Abramowitz and Stegun (1964, eq. 26.2.14)] to the order 20. The latter gave us almost perfect accuracy up to values of the argument of - 37 [i.e., for values of @(.) as small as 10-300]. These numerical issues are especially critical for the results presented in table 2, where the integrating constants, crucial for the posterior odds in (22), are tabulated [for c = 1 in (36)]. At the posterior mode ~j, we have also evaluated the posterior density function and the likelihood. Note that the latter value seems to agree with Greene’s (1990) ML evaluation forj = 1, but then decreases slightly with j, which is oery different from his results for a two-parameter gamma withj estimated at 2.45 with standard deviation 1.10. Both our results concerning the likelihood at the posterior mode and the integrating constant (i.e., the likelihood averaged out with the prior (36) where c is taken to be unity) favour the exponential model. Its posterior probability is 0.45, whereas for both j = 2 and j = 3 we get about 0.27. In contrast, Greene (1990) finds a loglikelihood value of 112.72 for the gamma case with free j, which would imply very strong evidence against the exponential. Although we fix j and our results are thus not formally comparable, we find it quite unlikely that the (otherwise very flat) likelihood would shoot up to such a value in between our models Mz and M3. In our view, Greene’s (1990) findings might have been crucially influenced by the hard numerical problems that we have referred to in the previous paragraph. See also Ritter and Simar (1993). Finally, we present posterior results for the efficiency of firms within the sample, r,, and the out-of-sample efficiency, r/. Posterior densities p,( rsly, X) are graphed in fig. 1 for j = 1,2,3, from which we clearly see that for a greater value
----
J=2
_---_
-Erlang
I=3
r-nixed
9.60
7.20
2.40
040
0.52
Fig. 1. Posterior
0.64
0.76
efficiencies,
out-of-sample
0.80
wth
diffuse prior
100
J van den Brncck et al
Evtdence
log K, logP,(QY,~) logl,(H,ly.W P(M,) P(M,lY,X)
J=
r/
Table 2 on model probabtlmes
291
wtth drffuse prtor
/= 1
J=2
J=3
49 403 60.258 67.005
48.886 59 029 66.127 l/3 0.2706
48.903 58.370 65.747 l/3 0.2754
l/3 0.4541
Posterior
‘1 r1 r-2 r3 r4 rs Medtan
, Stochastrc frontier models
moments
Table 3 of effictenctes
1
with dtffuse prior.
J=2
J=3
)
Mean
(sd.)
Mean
(s.d
0.9169 0.6 169 0 9695 0.9302 0.8901 0.9582 094
(0.0807) (0.0777) (0.0301) (0.0609) (0.0878) (0.0400)
0.9077 0.6645 0.9530 0.92 12 0.9052 0.94 19 093
(0.0694) (0.1246) (0.0329) (0.0550) (0.0660) (0.0407)
Mean
(sd.)
0.9038 0 7435 0.9392 0.9058 0.8960 0.9277 0.9 15
(0.0618) (0.1007) (0.0360) (0.0591) (0.0654) (0 0432)
ofj the modes are shifted to the left, in agreement with the properties of the sampling model. Using the posterior model probabilities of table 2, the overall posterior density of out-of-sample efficiency for Erlang models is also plotted. Posterior moments for rf computed as in (18) and for individual efficiencies r, (i = 1, . , 5) of the first five firms in the sample’ are presented in table 3. Just as the out-of-sample efficiency is more concentrated around the mean for larger j, the means of individual firm efficiencies are also less spread out as j goes up. Of course, for very large values of j the Erlang distribution tends to a Normal, which will become indistinguishable from the symmetric Normal on v,. All the stochastics will then be attributed to determining the frontier itself, and all firms will tend to be equally efficient, although the level of efficiency will no longer be identified. Comparing table 3 with Greene’s (1990) results, we note some interesting differences, especially with his gamma model, which seems very far off (both from our results and his results using other distributions) for firms 1 and 2. Remark that these firms are, respectively, the least and the most efficient of the subset of five firms considered here. ‘These are, respectively, and Bangor Hydro.
Cal. Pac. Utthty,
Montana
Power,
Upper
Pen
Pwr., Mt. Carmel
Pub.,
292
7.2. Overall
J van den Brorck
e/ al., Sfochastrc
frontrer
models
analysis
In order to combine models with varying sets of model-specific parameters 6,, we shall use the prior elicitation rules derived in section 6. For (j3, r~‘) we continue to use the diffuse prior p( 8, a’) = K’. We now no longer compare sampling models under the same prior assumptions (as in the previous subsection), but we compare Bayesian models, i.e., combinations of a sampling model and a model-specific prior density. The proper priors in (25) and (28), (29) with a = 1, were chosen to preserve the shape and characteristics of the corresponding sampling models and are quite different. In order to isolate the modelspecific elements, it is instructive to consider p,(zi) = Sn,pj(z~lGj)P,(fi,)dG,. Of course, the z,‘s are no longer independent when 6, is integrated out, so that their contribution to the jth Bayesian model marginalized with respect to 6, is not exactly fly= tp,(z,), but the marginal priors of zI, or rather of r, = exp( - z,), can nevertheless guide our intuition. Figs. 2 and 3 plot the priors of r, forj = 1,2,3,4, using (27) and (34) with r* = 0.875 and 0.5. Clearly, prior 1 is most conservative, since its shape is least affected by going from r* = 0.875 to r* = 0.5, which are very different values from an economic point of view. From figs. 2 and 3 it seems that M3 should receive most of the posterior probability mass if prior and sample information are in accordance, i.e., if values of r, close to the prior median r* correspond to high values of p,( rrl y, X) obtained under a diffuse prior. On the other hand, if prior and sample information are in conflict, the posterior ranking should be relatively more in favour of MI.
).OO
0.20
0.40
Fig. 2 Prior efficlencles,
0.60
mformatlve
0.80
1 .oo
prior with Y* = 0 875
293
0.20
0.00
Fig. 3. Pnor
Let us first present
0.40
effiaencles,
in tables 4-6
0.60
mformatlve
0.80
pnor
wth
1.00
r* = 0 5.
the results for the prior median efficiency median efficiencies in the diffuse Erlang case (see table 3), prior and likelihood should roughly be in accordance with each other here. Indeed, table 5 now attributes the bulk of the posterior probability mass to M,, the Erlang model where the marginal prior on r, is most concentrated around its mode. With the exception of M,, which receives about 7% of the posterior probability, the rest of the models get little posterior probability, in particular MO, the model which imposes full efficiency. Although the relative ranking of the Erlang models is very different from the diffuse case in subsection 7.1, the posterior moments of /3* are hardly affected by the informative priors, as is obvious from comparing tables 1 and 4. In fact, the posterior moments of /I are close to the ones for the baseline model MO under all the efficiency distributions considered. Of course, the posterior mean of 0’ decreases as soon as a composed error is assumed. The total error variance TV,, however, is quite stable across models. Comparison with the diffuse prior case in table 1 shows that variance ratios VR, are essentially unaffected by the added prior information, which does not contradict the data information here. Results for the truncated Normal model are more different from Greene’s (1990) ML estimates than for the Erlang models (barring the log-likelihood value discussed in the previous subsection). Greene finds a large negative value for /L (with an even much larger asymptotic standard error), whereas we find r* = 0.875. In view of the posterior
G
w-2 VRJ TV,
co2
’
1 00213
0.02 13
u2
E. ;.-
0 2462 0.0316 0.0792
(0.0028)
(0.0027) (0.0674) (00621)
(0 3397) (0.0386)
- 7.205 0.3860
!: 84
(s.d.)
Mean
J=o
J=l
0 5562 0 0232
0 0954 11.273
0.0129
- 7.479 0.4276 0 0295 0.2492 0.0449
Mean
Posterior
(0.0242) (3 312)
(0.0036)
0.6721 0.0223
0.0557 19.302
0.0150
- 7.438 0.4145 0.0300 0.2495 0.0573
Mean
j=2
Table 4 of 0, with informative
(0.3448) (0.0428) (0.0028) (0.0652) (0.06 19)
(s.d.)
moments
(0.0136) (5.907)
(0 0034)
(0.3898) (0.0433) (0.0029) (00717) (0.0626)
(s.d.)
priors
0.7760 0 0220
0.0348 32.184
00171
~ 7.342 0.3927 00313 0.247 1 0.0748
Mean
J=3
on 6, (r* = 0.875)
(0.0105) (12.429)
(0.0033)
(0.3400) (0.0436) (0.0030) (0.0663) (0 0668)
(sd.)
0.4130 0.0190 62.80
0.0131
- 7.374 0.3988 0.0311 0.2428 0.0654
Mean
j=4(a=
(0.5792) (0.0083) (23.89)
(0.0033)
(0.3187) (00391) (0.0027) (0.0653) (0 0658)
(s.d.)
1)
J van den Broeck rt al
Model probabihttes
log K, _ logP,(Q,lY,‘U P(M,) P(M,IY?X)
Posterror
j=
48.070 61.775 l/5 0.0013
50.294 61.161 115 00116
]=o Mean (s d.) 1
r-1 r2 r3 r4 r5 Medtan
1 (0) 1 (0) 1
(0)
(0)
1 (0) 1 (0) rl
1
Table 5 with informattve
j=O
moments
r/
, Stochastic frontier models
of effictenctes
prtors
1
j=3
52.066 62.608 l/5 0.0683
54.654 64.5 18 l/5 0.9089
j=2
J=t
on 6, (r* = 0 875).
j=2
Table 6 with mformattve
295
prtors
j = 4 (a = 1) 50.134 58.452 115 0.0099
on S, (r* = 0.875).
j=3
j = 4 (a = 1)
Mean (sd.)
Mean (sd.)
Mean (sd.)
Mean (s.d.)
0.91 0.55 0 97 0 93 089 0.96 094
0.90 0.66 0.95 0.92 0 89 094 0.91
0.90 0.68 0.94 0.90 0.89 0.93 0.91
0.83 0.70 0.96 0.87 0.84 0.94 0 83
(0.08) (0.09) (0.03) (0.06) (008) (0 04)
(0.07) (0.10) (003) (0.06) (0.09) (0.04)
(0.06) (0.10) (004) (006) (006) (0.05)
(0 10) (0.08) (0.04) (008) (0.09) (0.05)
a positive mean of I/, with a small posterior standard deviation of the same order of magnitude. The resulting truncated Normal distribution of Greene (1990) is close to the exponential distribution (see his fig. l), which seems at odds with his strong support for the gamma model with shape parameter P estimated at 2.45. Indeed, a positive posterior mean for $ renders the truncated Normal model closer to the Erlang cases with j > 1, which receive most support in our findings. The posterior mean for CJ’ seems in line with his estimate for the half-Normal, but is about twice his estimated value for the truncated Normal (a + 0).8 Furthermore, Greene’s estimate of w’, which is 0.1674, is roughly nine times as large as the posterior mean we find for the case with a = 1. The latter is, of course, linked to his large negative estimate for p. The likelihood function seems to be very flat in the direction of p and behaves much better when parameterized in terms of $ = P/O. In addition, the prior on $ in (28) helps in resolving problems of identification. ‘Table 1 m Greene (1990) IS not easy to read. In hts notatton, u2 IS the vartance of L’= at for the gamma models, but 6’ = 0: + 0: m the truncated Normal cases where ui, whtch corresponds to our w’, IS larger than var(u) due to truncatton In addrtion, the value presented for his i. = 6,/e, does not seem in accordance wtth hts esttmates for ai and CT:m the general truncated Normal model.
Posterior moments for within-sample and out-of-sample efficiencies are close to the diffuse case for the Erlang models, with the exception of firm 1, which displays a great sensitivity to prior and sampling model assumptions. As it is by far the least efficient of the firms considered and thus the furthest out in the tail of the distribution of z,, it should, indeed, be most affected by different forms of tail behaviour of the efficiency distribution. Although the individual firm efficiencies are not far from those of the Erlang models, the out-of-sample efficiency of the truncated Normal case is shifted away from full efficiency here (see fig. 4). Comparing table 6 to Greene’s point estimates of inefficiency, we observe again the very high values he finds for firms 1 and 2 with the gamma model and the relatively close correspondence for the exponential model. In addition, his half-Normal model leads to estimates that are similar to our posterior means for j = 3, but the truncated Normal case of Greene produces a much larger spread of individual firm efficiencies than found in table 6 for j = 4. The latter is, of course, to be expected as his truncated Normal model behaves like as exponential and induces a relatively flat efficiency distribution. Let us now shift the prior median efficiency to a completely different value, r* = 0.5, which is no longer in line with the data information. Tables 7-9 summarize the posterior findings where the probabilities of the Erlang models are now much closer, in line with our expectations based on figs. 2 and 3. The large conflict between the prior and the sample information for Ml-M4 results in M,, implicitly assuming full efficiency (which is much more in line with the data) taking most of the posterior probability mass. The disaccordance between prior and data information can be illustrated by computing Bayes factors for the Bayesian models with common sampling distribution and prior structure, but the different values of r* used here, defined by BT = p,(ylX, r* = 0.875)/ p,(ylX, r* = 0.5). These Bayes factors clearly favour Y* = 0.875 and are given by 218, 1077,408 1, and 3 17.426 forj = 1,2,3, and 4, respectively. This also indicates that ,j = 1 corresponds to the most conservative model. Comparing table 7 with table 4, we note that less of the stochastics is attributed to the frontier in (2) and more to the inefficiency, as could be expected. This is evidenced by the drop in variance ratios, especially for higher j. Note that this fact reverses the ordering of VR, among the Erlang models. We did not compute VR, for M4, but the more than ten-fold increase in the posterior mean of (0’ combined with a decrease in the mean of 0’ definitely points in the same direction. In addition, the now negative posterior mean of Jo requires large values of u2.9
‘For M4 wth r* = 0 5 the Monte Carlo lntegratlon performed better m terms of (p,(u) than m (I/I, CO) A careful analysis of this issue IS a topic of further research Goldstem (1991) proposes to w h ere p, IS the Jth central moment of the composed error parametenze m terms of (/I,~2,~j,$),
(c, - z,).
04133 0.0259
co2 us L vR~ TV/
F
I 00213
(0 0202) (1.519)
0.1199 8.586
(0.0641) (0 0028) (0 0603)
’
1. i.
(0.0028)
0 0289 2447 0.0362
(0.3295) (0.0418)
(sd)
(0.0027)
0.0213
o2
(0.0674) (0 0027) (00621)
- 7 526 0 4398
Mean
/=l
00107
0.03 0 2462 16 0.0792
(0 3397) (0.0386)
- 7 205 0 3860
k IA
(s d.)
Mean
J=o
0 3658 0.0254
0.0870 11 762
0.0093
0.2411 0.0294 0 0385
~ 7 532 0.4324
Mean
J=2
(0.0128) (1.856)
(0.0028)
(0.0026) (0.0632) (0.0599)
(0.3193) (0 0385)
(sd.)
0.2971 0 0259
0.075 1 13.573
0.0077
0.23 16 0.0294 0.0432
- 1.529 0.43 11
Mean
J=3
Table 7 Posterior moments of 0, wtth mformatlve priors on 6, (r* = 0.5).
(0.0104) (1.872)
(0.0029)
(0.0028) (0.0654) (0 0620)
(0 3197) (0.0405)
(s d.)
- 0.9094 0.2016 5.258
0 0082
0.2270 0 0290 0 0326
- 7.478 0.4412
Mean
/=4(a=
(0.2844) (0 0501) (1.270)
(0 0022)
(0.0657) (0.0027) (0.0631)
(0.3333) (0.0405)
(s.d.)
I)
J. vm den Brorck et al.. Stochastic frontlet- models
298
Table 8 wtth informative
Model probabtlities J=o
log K, _
Posterior
44.9 10 56.067 t/5 0.0334
moments
of efficienctes
j=O Mean (sd.) rJ
1 (0)
rl rz r3 r4 r5 Median
1
(0)
1 1 1 1
(0) (0) (0) (0)
rJ
j=3
45.084 55.783 115 0 0397
46.340 56.691 115 0.1396
1
Table 9 wtth mformative
prtors
J=2
1
j=
on 6, (r* = 0.5).
j=2
J=l
48 070 61 775 115 0.7873
tog P,(~,lY>X) JYM,) P(M,lxX)
priors
J = 4 (a = 1) 37.466 48.892 l/5 0 00002
on 6, (r* = 0.5). /=3
J=4(0=
1)
Mean (sd.)
Mean (sd.)
Mean (sd.)
Mean (sd.)
089 0 60 0.97 0.92 0.87 0.96 0.92
0.85 0.55 0.95 0.88 0.8 1 0.93 0.86
0.81 050 0.92 0.83 0.77 089 082
0.68 0.52 0.97 0.90 0.81 0.95 0.65
,= 1
(0.10) (0.07) (0.03) (0.06) (0.09) (0.04)
____ )=2
(0.10) (0.06) (0.04) (0.07) (0.08) (0.05)
___-- J=3
(0.10) (005) (0.05) (0.07) (0.08) (006)
(0 16) (0.04) (0.03) (0.07) (009) (0.04)
_.-._ I=4
10.50
8.40
-
6.30
-
4.20
-
,“\, : 1 ,’ : ,‘,_ - _ -1, : :‘\ / 0 t( / 15 ,:’ ;: /Y /
2.10
,,r---;<’ __--0.40
..-..
0.52
Ftg 4. Posterior
,
_.c
,T._,’ \,/,’
./ ,~~~~,, 0.00
/;<;Y.,__~,
\
//
-
i, 1,
,:’ ’
:I :: ,b 1’ \
‘L’ /’ )<<*
*-_-‘___-c 0.64
etlictenctes,
0.76
out-of-sample
0.88
100
with r* = 0.875.
As the prior information is increasingly conservative for j = 4,3,2,1, the out-of-sample efficiency r/ in table 9 is not affected much for j = 1, but more and more as j increases. Fig. 5 graphically displays the posterior densities of rI for r* = 0.5. Comparing figs. 4 and 5, the relative robustness of the exponential
J ~‘an den Brorck
J= 1
____
CI al I Stochastrc
J=2
-----
frantwr
models
299
_._._ j=4
J=3
8’50 I
040
0.52
Fig 5. Postenor
064
efficlencles,
0.76
out-of-sample
0.88
wth
r* = 0.5.
model with respect to large changes in r* is clearly illustrated. Finally, note that for firm 1 the prior does not contradict the sample now, so that its individual efficiency is more concentrated than in the case with r* = 0.875, and its mean is increasingly drawn towards the prior median asj grows (i.e., as more prior mass is concentrated around the median; see fig. 3).
8. Conclusion A Bayesian analysis of stochastic frontier models was shown to be both theoretically and practically feasible. A simple application of the rules of probability calculus leads to posterior densities of efficiencies, both within-sample (firm-specific) and out-of-sample (average), where parameter uncertainty is entirely taken care of. This paradigm thus allows direct posterior inference on firm-specific efficiencies, avoiding the much criticized two-step procedure of Jondrow et al. (1982). The difficult choice of a particular sampling model for the inefficiency error term is avoided by mixing over different models, reflecting the spectrum of distributions proposed in the literature. Inherent to this mixing and the associated computation of posterior model probabilities is the fact that proper priors are required on those parameters that are not common to all models considered. We propose a set of prior structures that preserve the mam characteristics of the different sampling distributions. In addition, they allow for very simple prior elicitation rules, based only on one number, the prior median efficiency. We apply these results to the analysis of 123 US electric utility companies in 1970, a data set used and listed in Greene (1990). We briefly treat some
J. van den Broeck et al., StochaArrc frontw mode1.s
300
numerical issues that appear quite crucial for this application, and were partly overlooked in the previous literature. The empirical analysis relies on Monte Carlo integration with importance sampling for all parameters and the efficiency term. Although this is numerically not a trivial exercise,” it provides us with a wealth of information; the actual posterior densities of, e.g., average efficiency can be plotted, taking into account all prior and sample information and wtth parameter and model uncertainty averaged out. We highlight the main differences with the sampling theoretical approach in Greene (1990) and perform the analysis both under diffuse and proper priors. In the latter case, the choice of prior median efficiency, r*, naturally affects our results. However, as expected, the exponential model is least affected by shifting from a prior median in accordance with the data to one in severe conflict with sample information. In this particular application, the truncated Normal model is found to be most sensitive to the choice of r* and turns out to be the least favoured of all composed error models considered.
Appendix A Joint sphericity
qf the disturbances
of the stochastic,frontier
Here we will show that, under the usual diffuse prior of g2, all the posterior results (on quantities other than the scale parameter c of u,‘s) are perfectly robust w.r.t. departures from Normality within the class of joint spherical distributions of v = (ur , . . , uN)‘. Assume that the vector v of N random disturbances of the stochastic frontier (2) has a continuous spherical distribution with scale parameter cr, and is independent of all x,‘s and z*‘s. Therefore, under any M, (j = 1,2,3,4), the density of v takes the form
P,(4X,Zl, .
, z,,8,6,,0)
= p(vla) = KNg,(am2v’u),
where 6, groups the model-specific nonnegative function such that
N’2m1gN(u)du = T(N/2)KN”,
parameters
of M, and
(A.11 g(.) is a known
(A.2)
“A full lmplementatlon of our Gauss-386 programmes with the computation of a posterior density plot for r, and first- and second-order moments for r, through r5 ran at the rate of 10,000 antlthetlc drawmgs per hour on a 386-33 personal computer with coprocessor Typical runs were of the order of 100.000 drawmgs.
J ccm den Broeck rt ul , StochaJtrc
frontrrr
301
models
which is both necessary and sufficient to make a-Ny,(a-2v’v) a proper, normalized density function; see, e.g., Dickey and Chen (1985). Assume that gN(.) is nor indexed by c and consider the following (improper) prior structure for the parameters of M,: (A.3) where c is a positive constant and p,( b, 6,) is functionally independent of 0. Thus the joint (improper) density of v and 0 given x,‘s, z,‘s and (fi,S,) is given by P,(V,OlX,Z,,
‘. . 3 ZN,#&hJ) = p(vlo)p(o)
= com”m’g,(am*v’v).
Now the transformation from 0 to u = 0 -*r’v, with the Jacobian (1/2u)Jv’vlu, can be used to facilitate the integration over 0 E (0, + cc ). Since this transformation does not affect gN(.), which is not indexed by 6, we can use (A.2) in order to obtain the (improper) density of v: pJ(t’lX,z,,
. . , ZN, p,S,) =
p(v) = (c,2)(a’l;)~N’2jllN’z~1g~(U)dU 0 =
(c/2)r(N/2)Z-N’*(v’v)-N’*.
(A.4)
Remark that the form of this density is not affected by the particular form of the function gN(‘) and remains exactly the same for any gN(‘) fulfilling (A.2). Since (A.4) can also be viewed as the density of the observation vector y given (x,z,, . ZN, p,d,), the joint density p,(y,z,,
. . . > zN, b~~jlx)=Pj(.blx,zl,~~. x
P,(ZI
> zN,B,6~)
1 . . . > zN16j)P~(b,6~)
and the posterior pj(Z,, . . , zN,p,6,ly,X) remain unaffected by the change from Normal to any other spherical distribution of v. Ifjoint sphericity of (v’ v~)’ is assumed, then, under (A.3) full robustness of the predictive density of yf given (y, X, xs) holds as well. The result presented here was first proven in the general case of nonlinear regression with elliptical errors by Osiewalski and Steel (1993).
Appendix Probability
B density jimctions
A k-variate Normal density on x E IRk with mean vector b E lRk and PDS k x k covariance matrix C: ,fk(xIb,C)
= [(27~)~lC/]~‘/*
exp - f(x - b)‘C’(x
- b).
J van den Broeck et al., Srochasric frontier models
302
A k-variate Student t density on x E IRk with r > 0 degrees of freedom, vector b E lRk, and PDS k x k precision matrix A:
f
;Xr (
A gamma
U(r + WI I ) ‘,
density
A)
=
qr/2)(rn)k’2
(I+ 1 k)/2
,A,
1/Z 1 + ‘(x r
- b)‘A(x - b)
on z > 0 with a, b > 0:
&(zja, b) = b”[r(a)]-‘z”-’ A beta density
location
exp( - bz).
on v E (0, c) with a, b > 0:
A three-parameter inverted beta or beta prime density [see Zellner (1971, p. 376)]:
.Mwla>b,c) =
on w > 0 with a, b, c > 0
c;~~;Pb))(w/c)b-l(l +W/C)-(a+b!
References Abramowttz, M. and LA. Stegun, 1964, Handbook of mathematical functions (Dover, New York, NY). Aigner, D., C A.K. Lovell, and P. Schmidt, 1977, Formulatton and estimation of stochastic frontier production function models, Journal of Econometrics 6, 21-37. Bauer, P.W., 1990, Recent developments in the econometric estimatton of frontters, Journal of Econometrics 46, 39956. Beckers, D.E and C J. Hammond, 1987, A tractable hkehhood functton for the normal-gamma stochastic frontier model, Economics Letters 24, 33-38. Broeck, J. van den, F. Broeckx, and L. Kaufman, 1991, Decomposing stochastic frontier efficiency into secular and organisatorial efficiency, Journal of Computational and Apphed Mathematics 37, 251-264. Christensen, L.R. and W.H. Greene, 1976, Economies of scale m US electric power generation, Journal of Pohtical Economy 84, 6555676. Dickey, P. and C.-H. Chen, 1985, Direct subjective-probability modelling usmg ellipsotdal distributions, m: J.M. Bernardo, M H. DeGroot, D.V. Lmdley, and A.F M Smith, eds., Bayestan stattsttcs 2 (North-Holland, Amsterdam). Goldstem, H.E, 1991, On the COLS moment estimation method for stochastic frontier production functions, Mimeo. (Umverstty of Oslo, Oslo). Greene, W.H , 1990, A gamma-distributed stochastic frontter model, Journal of Econometrics 46, 141-163. Jondrow, J , C.A.K. Lovell, IS Materov, and P. Schmidt, 1982, On the estimation of technical mefficiency m the stochasttc frontier production function model, Journal of Econometrtcs 19, 233-238
J. van den Brorck et al.. Stochustrc frontwr
models
303
Johnson, N.L. and S. Katz, 1970, Dtstributrons m statisttcs. Continuous univartate dtstrrbuttons 1 (Houghton Mrfflin, Boston, MA) Meeusen, W and J. van den Broeck, 1977, Effictency esttmatton from CobbbDouglas productton functions with composed error, lnternattonal Economrc Revrew 8, 435-444. Nerlove, M., 1963, Returns to scale m electrrctty supply, m: CF Chrtst, ed., Measurement m economics (Stanford University Press, Stanford, CA) Ostewalskt, J. and M.F J. Steel, 1993, Robust Bayestan inference in elhptrcal regressron models, Journal of Econometrtcs 57, 3455363. Pearson, ES. and H.0 Hartley, 1954, Btometrtka tables for stattsttcrans, Vol 1 (Cambrrdge Umversrty Press, Cambrtdge). Rttter, C. and L Simar, 1993, Two notes on the Normal-Gamma stochastrc frontter model: A simple analysis of the American utility data, and correctrons to the results m Greene (1990) Mrmeo. (Universtty of Louvam, Louvam-la-Neuve) Stevenson, R.E , 1980, Ltkehhood functtons for generalized stochastic frontier estrmatton, Journal of Econometrrcs 13, 57-66. Zellner, A., 1971, An introduction to Bayestan Inference in econometrrcs (Wiley, New York, NY)