Available online at www.sciencedirect.com
Health Policy 87 (2008) 1–7
Measuring efficiency in clinical departments夽 Jon Magnussen a,∗ , Kari Nyland b a
Department of Public Health and General Practice, Norwegian University of Technology and Science (NTNU), 7489 Trondheim, Norway b Trondheim Business School, Trondheim, Norway
Abstract Objectives: This paper explores the possibilities and limitations of obtaining and interpreting efficiency measurement on the level of the clinical department. We discuss the limitations of case-mix groupings such as the diagnosis related groups on this level. Methods: Hospital costs are allocated to clinical departments and efficiency measured using data envelopment analysis (DEA). Outputs are measured as number of discharges adjusted for case-mix using DRGs. The effect of department vs. hospital on the level of measured efficiency is analysed using a simple fixed effects regression model. Results: We find that measured efficiency depends critically on the chosen model specification. Some department types, notably children’s departments have systematically lower levels of measured efficiency. Conclusions: : Our findings have implications for the monitoring and financing of clinical departments. DRG type instruments should be applied with caution both for monitoring and financing purposes on the departmental level. © 2007 Elsevier Ireland Ltd. All rights reserved. Keywords: Hospital efficiency; Data envelopment analysis; DRG; Hospital costs
1. Introduction Measures of hospital efficiency play an important role in the evaluation of health policy initiatives [1,2] and in comparative analyses of health systems [3]. They can also be used in the monitoring of resource utilisation, although the value of this is sometimes questioned [4]. A common feature of these applications, however, is that efficiency is measured on the hospital level. Thus 夽 Contract grant sponsor: The Norwegian Ministry of Health and Social Affairs, through SINTEF Health Research. ∗ Corresponding author. Tel.: +47 73597569; fax: +47 73597577. E-mail address:
[email protected] (J. Magnussen).
most measures of hospital efficiency are “average” measures over a set of clinical departments that may perform quite differently1 . From a policy point of view this may not always be satisfactory. In many systems we find that decision-making power is increasingly delegated to the departmental level. Clinical departments thus become “firms within a firm” and there is subsequently a demand for performance measures that are relevant on this level. The main aim of this paper is to discuss the applicability of measures of efficiency on the level of the clinical department. The discussion is done within the 1
Two publications in Danish [5,6] are notable exceptions.
0168-8510/$ – see front matter © 2007 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.healthpol.2007.09.013
2
J. Magnussen, K. Nyland / Health Policy 87 (2008) 1–7
context of the Norwegian health care sector where the internal organisation of hospitals has gradually shifted from a centralised to a decentralised model of decisionmaking. In 2003 98% of all clinical departments had their own budget, while the share of departments that use internal pricing for ancillary services increased from 12% to 34% from 1999 to 2003 [7]. Also the financing of hospital services is increasingly aimed at providing incentives on the departmental level. The share of departments that have an element of DRG based financing has increased from 21% to 63% in the same period. Thus there is substantial autonomy on the budgetary level, a development that is supported by organisational models with increased focus on management issues and departmental leadership. A question then is whether this departmental autonomy is (or can be) accompanied by monitoring instruments such as measures of efficiency. In this study we focus on two issues related to this question: The first issue relates to the possibility of actually obtaining valid measures of efficiency at the departmental level. A persistent challenge in the economic analysis of hospitals has always been to account for the multi-product nature of hospital production [8]. The usefulness of departmental based measures of efficiency will therefore critically depend on their robustness to the choice of output specification. Our second issue is the extent of variation of measured efficiency between different department types. Systematic differences between department types may be an indication of both inaccuracies in the measurement of outputs, but also of differences in production technologies that are not captured in more aggregated models. Our focus is not as much on the need for efficiency measures in internal management2 as on the possibility to use DRG-type output measures to provide departmental measures of efficiency for monitoring and financing purposes. This paper is structured as follows. In Section 2 we discuss the clinical department as unit of analysis, describe the methodology used to calculate costs and activity, and present some descriptive data. Efficiency measures are obtained using the non-parametric 2 Where a number of tools such as balance scorecards, etc. are available.
approach of data envelopment analysis (DEA). This is described in Section 3. Section 4 then describes differences in measures of departmental cost efficiency, while Section 5 provides the concluding discussion.
2. Clinical departments as decision making units The organisational structure of hospitals will vary, but hospitals can broadly be characterised by three different types of activities: Administrative and technical support; ancillary services and clinical departments. We choose a framework in which clinical departments serve as the core units of a hospital, in the sense that the decisions made in the clinical departments will determine the need for/use of resources at both administrative/technical support level and in the ancillary departments. Consequently we view each clinical department as a single decision making unit. When we interpret measures of efficiency at the departmental level we need, however, to take into account that the performance of each department will depend on the performance of both administrative/technical support departments and on the performance of the ancillary departments. We return to this point in more detail below. Hospitals treat a variety of patients using several different inputs. There is no consensus as for how to measure hospital outputs, generally the chosen output vector depends on the available data and, to some extent, the problem to be analysed [8]. In this paper we use as our starting point the number of treated patients and correct these for differences in case-mix by using the diagnosis related groups (DRG). While there are other ways of adjusting for case-mix the use of DRGs is particularly relevant in the Norwegian context, due to the systems integral part in the financing of the hospital sector [1]. As noted the income of clinical departments increasingly depends on the DRG-adjusted number of discharges. For monitoring purposes there is also increased interest in comparing average costs per DRG across departments. We also provide extensions to the basic DRGframework along two lines. First we separate out day care as an output. The motivation for this is that day care patients, not being in need of overnight beds, provide a different conceptual output than traditional inpatients.
J. Magnussen, K. Nyland / Health Policy 87 (2008) 1–7
Second we separate out long term care patient days as an output. Again these will be patients who will be treated with a different technology than traditional inpatients. Moving from one to three outputs is admittedly an ad-hoc decision. We proceed in the next section to test the necessity of this, thus for now we simply propose to measure hospital output in three alternative ways: • A single output measure, O, measured as the weighted sum of discharges. • Two outputs, OI and OD , measuring, respectively, the weighted sum of inpatient and day care discharges. • Three outputs, OI , OD and OL , measuring, respectively, the weighted sum of inpatient discharges, day care discharges and long term care days. We do not have access to labour or capital inputs on the departmental level. Inputs are therefore measured as total operating costs, excluding capital costs. The present accounting structure of hospitals gives rise to two problems. First the hospital accounting systems do not always coincide with the structure of departments. Hence some costs must be attributed to the clinical departments based on assessment of e.g. the distribution of physician services between outpatient and inpatient activity. Second, we cannot measure the costs the clinical departments generate when buying/using3 services from ancillary departments and administrative support departments. To overcome these problems we have chosen to follow the procedure used in the calculation of cost-weights for the Norwegian DRGsystem. Roughly this means that we follow a top-down procedure of cost allocation: administrative and technical support costs are allocated to ancillary and clinical departments based on these departments perceived relative use of these services.4 The costs of services delivered from the ancillary departments to the clinical departments are first distributed to the DRGs according to measures of relative service use between the different groups, and then allocated to the clinical departments based on the particular case-mix of each department. There are three critical issues here. First we transfer (in)efficiencies in the administrative/technical support 3 As noted in the introduction, some hospitals do have as system of internal pricing. These are not reflected in our data, however. 4 Example: administrative costs are allocated based on number of FTEs; maintenance costs are allocated based on square meters, etc.
3
departments as well as ancillary departments to the clinical departments; i.e. we may uncover efficiency differences between the clinical departments that in reality are related to actions and behaviour that is out of control for these. Secondly the allocation of ancillary costs will depend on the way costs have been allocated to the DRGs. Any systematic error in this allocation will automatically be passed on to the clinical departments. Thirdly, using a top down rather than a bottom-up approach increase the potential for measurement error in departmental costs. We return also to these issues below. For now we propose three alternative input measures: • A single input measure, C, measuring total costs. • Two inputs, CO and CAC , measuring overhead costs and ancillary/clinical department costs, respectively. • Three inputs, CO , CA and CC , measuring overhead, ancillary and clinical department costs, respectively. Data are collected from 16 hospitals out of a potential of 50 in the year 2000. We utilise the data used to calculate the Norwegian DRG cost weights, and hospitals are chosen due to their ability to deliver high quality data. While not randomly selected, they comprise teaching hospitals, medium sized central5 hospitals and small local hospitals. Only departments believed to have a case-mix that could accurately be described by the DRG-system were included. Thus we include a total of 146 clinical departments in the analysis. Table 1 shows summary statistics for the various inputs and outputs included in the analysis.
3. Measuring efficiency Measuring efficiency as a radial measure representing the possible proportional reduction in inputs while staying on the production possibility set is an idea that originally stems from Debreu [9] and Farrell [10]. Farrell’s specification of the production possibility set as a piecewise linear frontier was followed up by Charnes et al. [11], who also originated the term data envelopment analysis (DEA). For an overview of the DEA-literature, including numerous applications to health, see [12,13]. Farrell’s original measure assumed constant returns to 5
That is, a hospital with more than three specialities.
4
J. Magnussen, K. Nyland / Health Policy 87 (2008) 1–7
Table 1 Summary statistics (costs in 1000 NOK)
Total costs (C) Overhead costs (CO ) Clinical dept cost (CC ) Ancillary costs (CA ) Weighted discharges (O) Weighted inpatients (OI ) Weighted day care (OD ) Long term care (OL )
N
Minimum
Maximum
Mean
S.D.
146 146 146 146 146 146 146 146
6573 1473 820 1064 191 21 0 0
416,051 112,200 238,907 128,465 13,441 13,403 1,702 9,283
88,711 22,307 40,597 25,807 3,099 2,752 348 1,564
73,190 19,120 37,603 23,717 2,591 2,549 411 1,570
scale (CRS), and was decomposed into measures of scale efficiency and efficiency relative to a variable returns to scale (VRS) technology in [14]. Implementation for piecewise linear technologies was done by Banker et al [15]. Their DEA formulation has served as the main model of most efficiency studies, and is the basic model in this study. Formally; if x is a vector of inputs and y is a vector of outputs, the production possibility set is defined as P = {y, x|y can be produced from x}
(1)
The efficiency of an input–output vector (y, x) can be defined as: θ E = minθ,λ |(λy, θx) ∈ P (2) λ This is a relative measure, which compares the input-output vector (y, x) with the vector that is of optimal size, keeping constant the mix of inputs and the mix of outputs respectively. Measures of efficiency are obtained from a finite sample, where the methodology envelopes the data as tightly as possible. This means that when samples are finite, efficiency measures will be biased. Biascorrect efficiency estimates can be obtained by using bootstrapping methods [16]; that is randomly drawing efficiency levels from the true efficiency distribution, with exogenously given inputs and outputs. Bootstrapping is also used to construct confidence intervals for the efficiency measures. As noted there is no intuitive “correct” way of measuring inputs and outputs. Any attempt to account for case-mix differences implies some form of aggregation. This imposes restrictions on the technology by aggregating because aggregation implies that the marginal rates of substitution (transformation) between
inputs (outputs) are constant (and also correct). Rather than performing an ad hoc aggregation one could statistically test to what degree a disaggregated model differs from an aggregated model with respect to the distribution of measured efficiency [17]. Thus we begin our analysis with a simple “one input–one output” model and then proceed to test whether a more disaggregated approach will give a more accurate representation of the production technology. The null hypothesis is that the added variable has no significant impact. We note that the models are nested in the sense that an aggregated model is nested within a disaggregated model and a CRS model is nested within a VRS model. Thus for a disaggregated model efficiency estimates will be equal to or larger than efficiency estimates in an aggregated model. Thus the assumptions underlying tests such as Mann–Whitney rank order tests will not be fulfilled in this case. Others [18] have suggested, based on Monte Carlo runs, that both a Kolgomorov–Smirnov test and a ordinary T-test of the differences of means performs well when sample sizes are reasonably large (+100). Generally the T-test will have more power. In our case the sample size is 146, thus we use a T-test of differences of means. The test structure is described in Table 2. Following this procedure we find that adding inputs and outputs to the basic one input–one output model significantly changes the distribution of efficiency scores. Thus the preferred model is the most disaggregated model, using three inputs, three outputs and assuming variable returns to scale. We note that this is as far as our data lets us disaggregate; thus we cannot exclude the possibility that a further disaggregation would lead to efficiency distributions that differ significantly from the one chosen here. Thus a general point is that effi-
J. Magnussen, K. Nyland / Health Policy 87 (2008) 1–7
5
Table 2 Test for differences between models with varying degree of aggregation H0
HAlt
Change in mean E
T-test
P-value
Result
(C, O; CRS) (CO , CAC , O; CRS) (CO , CA , CC , O; CRS) (CO , CA , CC , OI , OD ;CRS) (CO , CA , CC , OI , OD , OL ;CRS)
(CO , CAC , O; CRS) (CO , CA , CC , O; CRS) (CO , CA , CC , OI , OD ; CRS) (CO , CA , CC , OI OD , OL ; CRS) (CO , CA , CC , OI , OD , OL ;VRS)
3.37 7.86 2.91 5.02 3.31
9.6 13.1 9.9 7.7 6.5
0.00 0.00 0.00 0.00 0.00
Reject H0 Reject H0 Reject H0 Reject H0 Reject H0
ciency distributions and therefore also the ranking of departments may vary with the level of aggregation. This will have consequences both for the monitoring of efficiency and for resource allocation, a point we will return to in the discussion below. In the discussion that follows we choose also to retain our basic aggregated model. In this model efficiency is merely measured as the ratio of costs to number of weighted discharges, a measure however that is often used in the sector for practical monitoring purposes.
4. Results and discussion The two efficiency measures, the most and less aggregated, are summarised in Table 3. As expected the aggregated model yields the lowest average level of measured efficiency and the highest variation. Average measured efficiency rises by more than 16 percentage points from the aggregated CRS to the disaggregated VRS model. The Spearman rank correlation is 0.49; clearly indicating that efficiency measures and the ranking of departments will crucially depend on the chosen model specification. Table 3 Bias corrected efficiency measures – two alternative models Mean
S.D.
Aggregated model Ear–nose–throat/eye (n = 26) Maternity care/gynaecology (n = 28) Surgery (n = 41) Medical (n = 35) Children (n = 16)
65.6 68.1 69.6 72.9 59.6 48.6
12.1 8.8 9.5 10.3 9.7 7.2
Disaggregated model Ear–nose–throat/eye (n = 26) Maternity care/gynaecology (n = 28) Surgery (n = 41) Medical (n = 35) Children (n = 16)
82.0 82.4 86.0 81.7 81.5 76.7
9.5 7.9 7.7 8.7 10.8 11.2
From this simple analysis we can draw two important policy relevant conclusions: first, measuring efficiency on a departmental level by simply using casemix adjusted hospital discharges as the only output will lead to a perceived difference in efficiency between departments that differ substantially from models using a less restrictive and possibly more accurate operationalization of output. This is in line with previous research on efficiency measurement on a hospital level [4,8], but may be even more important should hospital owners wish to base the internal allocation of resources on measures of perceived efficiency. Second, even if we choose to ignore the numeric measures of efficiency, even ranking departments by costs per DRG-weighted discharge will give a quite different picture compared to the one we get from more disaggregated models. Again, the implications if these measures are used as a base for resource allocation may be serious. We now turn to whether and to what extent measured efficiency will vary systematically between different types of clinical departments. We choose the following (broad) groups of departments: (i) ear–nose– throat/eye, (ii) maternity care/gynaecology, (iii) surgical departments, (iv) medical departments6 , (v) children’s departments. While this grouping is admittedly somewhat ad-hoc, it is based on discussions with a panel of physicians who did not offer any serious objections to the choices made. Average measures by department type are given in Table 3, and indicate that efficiency may differ substantially with type of department. Fig. 1 shows confidence intervals for the two models and five department types. We note that using a simple comparison of costs per DRG gives rise to substantial and significant differences between medical and children’s departments on the 6 Includes internal medicine, neurology, lung medicine and gastro medicine. The internal organization varies and some hospitals may have more than one “medical department”.
6
J. Magnussen, K. Nyland / Health Policy 87 (2008) 1–7
Fig. 1. Error plots for mean efficiency level by department. Bias corrected efficiency measures.
one hand and the three department types treating more “straightforward” patient groups on the other. It is not plausible that systematic differences between department types are the result of differences in management styles. Thus we would assume that these differences reflect flaws in the measurement of outputs. They may, however, also reflect systematic differences between hospitals. We proceed to investigate this by running a simple fixed effects model including dummies for department types and hospitals. The results are shown in Table 4. We note that to what extent the observed differences between departments are statistically significant depends on the choice of model. Using medical departments as reference, we find that gyn/mat have a significantly higher level of measured efficiency and children’s departments a significantly lower level of measured efficiency in the disaggregated model. In the aggregated model the differences are both larger and generally more significant. Thus it would seem that Table 4 Fixed effects regression (t-values)
Intercept Gyn/Mat Surgical Ear–nose–throat Children Adjusted R2
Aggregated model
Disaggregated model
61.7 (21.5) 10.3 (4.40) 13.9 (6.58) 8.10 (3.41) −10.5 (3.8)
84.1 (33.9) 4.6 (2.3) 0.95 (0.5) 0.2 (0.1 −4.2 (1.76)
0.44
0.31
using a more disaggregated model will reduce the problems of output generated inaccuracies in measures of efficiency. The reasons behind observed differences in measured efficiency between department types should be pursued further. As a starting point we would suggest two explanations. First there may be structural differences between the department types related to function specific costs, i.e. costs that are not patient specific. Children’s departments may, for instance, have a higher staffing ratio due to the extra need for care for children. When output is measured by DRGs covering the whole age spectre, this will not be corrected for. Second, the fact that departments rely in varying degree on services “purchased” from ancillary and overhead departments means that they have different degree of control over their own level of efficiency. We see scattered attempts to introduce internal markets in hospitals, and a further issue would be whether these lead to improved efficiency.
5. Concluding comments First, this analysis supports the conclusion in earlier work on the measurement of efficiency on a hospital level [3,7] that individual efficiency measures will crucially depend on the chosen input an output specification. Thus both the ranking of departments and the ranking of hospitals will depend on the chosen model. On a group level (e.g. type of department), the efficiency measures are more robust, but using a too simple model specification may again lead to wrong conclusions about the relative efficiency between department types. Second there are large and significant differences in measured efficiency between different types of clinical departments. In particular departments with a more “straightforward” type of activity; that is surgical departments, ear–nose–throat/eye-departments and gynaecology/maternity care departments, have a higher level of measured efficiency than more “complex” departments such as medicine and children’s departments. Although we cannot substantiate this from the analyses performed here, we still believe that these differences to a large part can be attributed to errors in the measurement of output. More specifically we question the ability of the DRG-system to sufficiently
J. Magnussen, K. Nyland / Health Policy 87 (2008) 1–7
account for case-mix differences on the department level. As a consequence of this our third point is that caution is necessary when using measures of efficiency on the departmental level, and particularly when using the DRG-system as a base for financing and resource allocation on this level. In our view a simple DRGbased financing model on this level may lead to a systematic underfinancing of children’s and (possibly) medical departments and a similar overfinancing of surgical, gynaecology/maternity care and eye/ear–nose throat departments. There is therefore a need to adopt a different strategy when allocating resources to clinical departments. In the development of such a strategy there is obviously room for further studies of efficiency at this level: we would point at three potentially fruitful directions. Firstly we need studies that focus on technological, structural and institutional factors that may explain the relatively large observed differences in measured efficiency in clinical departments. Secondly we need studies that merge the “mechanistic” approach of efficiency measurement with other management tools currently available. Thirdly there is obviously a need to develop measures of hospital outputs that can be applied in analysis on a departmental level.
References [1] Biørn E, Hagen TP, Iversen T, Magnussen J. The effect of activity-based financing on hospital efficiency: a panel data analysis of DEA efficiency scores 1992–2000. Health Care Management Science 2003;6:271–83. [2] Gerdtham U-G, Rehnberg C, Tambour M. The impact of internal markets on health care efficiency: evidence from health care reforms in Sweden. Applied Economics 1999;31:935–45. [3] Linna M, H¨akkinen U, Magnussen J. Comparing hospital cost efficiency between Norway and Finland. Health Policy 2006;77:268–78.
7
[4] Street A. How much confidence should we place in efficiency estimates? Health Economics 2003;12:895–907. [5] Ankjær-Jensen A, Svenning AR. Anvendelse af DRG til produktivitetsanalyser p˚a afdelingsniveau. Tidsskrift for Dansk Sundhedsvæsen 2002;78(3):106–14. [6] Olesen OB, Ankjær-Jensen A, Svenning AR. DRG til produktivitetsanalyser p˚a afdelingsniveau - anvendelse af DEA. Tidsskrift for Dansk Sundhedsvæsen 2002;78(9): 329–35. [7] Kjekshus LE. INTORG – de somatiske sykehusenes interne organisering. (INTORG – the internal organization of somatic hospitals.) HERO Working Paper no. 6, Health Economics Research Programme at the University of Oslo, 2004. [8] Magnussen J. Efficiency measurement and the operationalization of hospital production. Health Services Research 1996;31:21–37. [9] Debreu G. The coefficient of resource utilization. Econometrica 1951;19:273–92. [10] Farrell MJ. The measurement of productive efficiency. Journal of the Royal Statistical Society 1957;120:449–60. [11] Charnes A, Cooper WW, Rhodes E. Measuring the efficiency of Decision Making Units. European Journal of Operational Research 1978;2:429–44. [12] Seiford LM. Data envelopment analysis: the evolution of the state of the art (1978–1995). Journal of Productivity Analysis 1996;7:99–137. [13] Worthington AC. Frontier efficiency measurement in health care: a review of empirical techniques and selected applications. Medical Care Research and Review 2004;61(2): 135–70. [14] Førsund FR, Hjalmarsson L. On the measurement of productive efficiency. Swedish Journal of Economics 1974;76: 141–54. [15] Banker RD, Charnes A, Cooper WW. Some models for estimating technical and scale inefficiencies. Management Science 1984;30:1078–92. [16] Simar L, Wilson PW. Sensitivity analysis of efficiency scores: how to bootstrap in nonparametric frontier models. Management Science 1998;44:49–61. [17] Halsteinli V, Kittelsen SAC, Magnussen J, Scale. Efficiency and organization in Norwegian psychiatric outpatient clinics for children. Journal of Mental Health Policy and Economics 2001;4:79–90. [18] Banker RD. Hypothesis tests using data envelopment analysis. Journal of Productivity Analysis 1996;7:139–59.