Can measuring results produce results: one manager's view

Can measuring results produce results: one manager's view

Evaluation and Program Planning 24 (2001) 319±327 www.elsevier.com/locate/evalprogplan Can measuring results produce results: one manager's view q J...

195KB Sizes 0 Downloads 17 Views

Evaluation and Program Planning 24 (2001) 319±327

www.elsevier.com/locate/evalprogplan

Can measuring results produce results: one manager's view q J.E. Hakes* Jimmy Carter Presidential Library, 441 Freedom Parkway, Atlanta, GA 30307, USA

Abstract Based on his experience as head of a small federal agency, the author found that performance measures made a very positive contribution to the organization, as long as the measures were credible and allowed to evolve over time in ways not initially foreseen. Measures were particularly valuable for tracking impacts on external customers and identifying internal issues with product quality. Published by Elsevier Science Ltd. Keywords: GPRA; Performance measurement; Performance metrics; Energy information; Administration; Results measurement; Federal evaluation

Passage of the Government Performance and Results Act (GPRA) in 1993 started a new period of experimentation in the federal government. To meet the requirements of the Act, agencies educated themselves about how better to identify outputs and outcomes of their activities and to establish appropriate measurement systems to track them. Since successful models of measuring results were relatively rare in government, guidance was sometimes sought from the experience of private companies. With no standard set of goals and measures suitable for every agency, many approaches were tried, both at the pilot program stage and at full implementation in FY 1999. Congressional committees have graded agency GPRA plans from `A' to `F', like term papers submitted to a teacher. While such evaluations of GPRA implementation have value, they should not distract from more serious questions. Evaluations of the expanded use of performance measures need to analyze what these early efforts tell us about the prospects for genuine reform. What can we learn from the early successes and failures that can bene®t later efforts? Do federal managers believe measuring results enhances the quality of government services? Should we expect to see actual improvements in the way government operates as a result of these efforts? Although these are important questions, public discussion of them has not attracted a wide enough variety of voices. Members of Congress, academics, planning of®ces, and others have discussed the successes and failures of GPRA.

But senior managers, both career and political, have been largely silent in public debates about managing for results. This void is unfortunate, because managers with actual line responsibilities are in a good position to make informed judgments about progress so far in measuring for results. This paper focuses primarily on the issue of developing relevant and effective measurement systems and is based on my experience as a presidential appointee who in 2000 ®nished a 7-year stint as Administrator of the Energy Information Administration (EIA) at the US Department of Energy. EIA was an early pilot project for performance measures and continued to adapt its metrics as experience indicated which approaches seemed most productive. For more information on the Energy Information Administration, see `Energy Information Administration' (1999 and 2000). The length of my tenure and personal interest allowed me to work with career managers and employees at EIA to develop measurement systems suited to EIA's needs and to observe the actual impact of those systems. EIA's experience, while it cannot be deemed typical or indicative of what has happened generally, does contribute to an expanding record of government agency experience available to other government agencies. It provides guidance on the value of establishing systems of performance measurement focussing on results and on approaches useful to effective implementation of this concept.

q In June of 2000, the author left the Energy Information Administration for his current position as Director of the Jimmy Carter Library and Museum. The views expressed in this article are solely those of the author. * Tel.: 11-404-331-3942; fax: 11-404-730-2215. E-mail address: [email protected] (J.E. Hakes).

Some might think it irrelevant to ask whether measuring results is worth the effort. It is, after all, required by law. Despite the legal mandate, however, agencies do have choices about the share of resources devoted to planning

0149-7189/01/$ - see front matter Published by Elsevier Science Ltd. PII: S 0149-718 9(01)00025-8

1. Value of measuring results

320

J.E. Hakes / Evaluation and Program Planning 24 (2001) 319±327

and measurement, about whether to involve top managers or leave these issues to the planning of®ce, and about how vigorously to utilize the data collected. I can attest that managers have many issues other than measuring results begging for attention. As a result, agencies should assess the value of identifying the results they would like to achieve, creating new measurement systems, and incorporating the data into agency operations. When I returned to the federal executive in 1993, I admitted to some skepticism about whether the types of metrics used in the private sector and advocated by theorists like Peter Drucker would work well in government. In the ®rst place, systems measuring results are not free goods. Computer hardware, software, and networks have reduced the transaction costs of collecting and disseminating performance data. They cannot, however, replace the employee and consultant time necessary to work on systems development. Second, the sometimes excessive legalism in government and the apparent lack of a functional equivalent to a pro®t increased the chances that any progress would be slow and that, if measures were not carefully selected, they might do more harm than good. Measuring results should obviously produce enough value to justify this time, effort, and risk. During my tenure at EIA, I became increasingly convinced that the bene®ts of measuring for results far outweighed the costs. If nothing else, a focus on measuring results counters a tendency in many federal agencies to overemphasize the inputs to governmental processes. In the normal course of government, the proportion of time spent on budgeting, personnel levels, and position descriptions often overwhelms the time devoted to outputs and outcomes. Perhaps worse, expenditures and numbers of employees, rather than amounts and quality of services delivered, often become the implicit barometers of organizational success. A well-designed emphasis on planning and measurement redirects more time and attention toward results. At the Energy Information Administration, an increasing focus on customer service and the development of performance measures mutually reinforced each other and helped increase the focus on outputs and outcomes. EIA's experience supported the ®nding of Kathryn E. Newcomer and Roy F. Wright: ªThe frequent reporting of strategic goals and performance data to employees makes them more aware of their purpose in the agency and encourages them to become outcome-orientedº (Newcomer & Wright, 1997). Measuring results also provides a basis for legitimately bragging about the accomplishments of public agencies. Many agencies have, for example, established useful and well-used web sites or maintained good records of customer service. However, I have been amazed at the number of agencies that have not collected or disseminated good data on customer usage of these sites and other services or on levels of customer satisfaction to document their achievements. This seems to me almost equivalent to a private

company not tracking its sales data. Without data, successes will likely go unrecognized, and motivation for further improvement will diminish. During most of my public speeches as Administrator, I showed a graph on the use of the EIA web site. As discussed in a case below, the growing number of users was impressive and could be used to document an important success story. Good data on issues impacting customers helped tell important stories. The ability to communicate success was an important part of relations with the Congress and the press and of building strong internal morale. Measuring results also promotes healthy communication between managers and employees. A good strategic plan, if properly publicized, tells an agency what is expected of people throughout the agency. Accurate metrics show whether goals are being achieved. Traditional communications within agencies sometimes lurch between no communication on important issues to overloads of data, some of questionable relevancy. Performance measures should be viewed as summary statistics on matters identi®ed as important to an agency's success. They provide information that is relevant and in the right amount. Communication is not impaired by data that are too sparse or too overwhelming. Robert S. Kaplan and David P. Norton in The Balanced Scorecard emphasize the link between measurement systems and communication, information, dialogue, and ªdouble-loop learningº (Kaplan & Norton, 1996, pp. 12± 19, 199±223). Those who advocate immediately making measures the basis of reward and punishment need to acknowledge the possibility that such use might compromise the quality of the information and impair the communications function of metrics. Measuring results also facilitates early identi®cation and correction of problems at the source, before they require correcting from the outside. If data show a drop in success, a quick diagnosis can identify the reasons and suggest immediate remedies. Agencies can provide better service to the public, while avoiding the necessity of having remedies imposed from the outside. If auto manufacturers were producing defective autos, they would presumably want to know immediately, not after several years and not after an outside organization had brought it to their attention. In another case discussed below, EIA avoided credibility problems by using quality measures to ®x data problems before they became issues outside the agency. Although many problems with performance can be ®xed internally without outside assistance, others require additional resources. In these situations, the case must be made to administration and congressional budget authorities. Proposals that are well supported with good data should receive priority over those less well documented. At the most basic level, metrics tell us what is working and what is not. Many managers are frustrated by taking initiatives for organization development and not knowing whether or not the effort succeeded. Good performance measures are a necessity for any progressive organization

J.E. Hakes / Evaluation and Program Planning 24 (2001) 319±327

to recognize successful strategies and discard the unsuccessful. Finally, measuring results makes agencies more accountable to the public. The public has a right to a good accounting of what is accomplished with public money. Government agencies are complex organizations with a wide variety of customers and goals. Good summary statistics help demystify the process. Accountability is an important goal, but can be overemphasized in some of the measurement rhetoric. The great concert pianists set higher standards than do their audiences. Public agencies should set higher standards for excellence than the minimal expectations of outside overseers. I use the term `outside overseers' to include groups external to the organization that might evaluate the agencies performance. On a regular basis, this would include appropriations and authorizing committees in the Congress and the Of®ce of Management and Budget. On an intermittent basis, it could also include the news media, the General Accounting Of®ce, and the courts. In cataloging the values of measuring results, we should also note their limitations. Data can inform the decisionmaking process, but numbers do not make decisions. There is no good way to mechanically translate performance statistics into policies or budget allocations. At EIA, levels of performance were not used as automatic indicators of whether a program needed more or less money. It is often unclear whether poor performance indicates a need for more resources or less resources. Sad though it may be to some, humans will still have to make decisions, albeit decisions based on better data. 2. Characteristics of good measures Good performance measures should be clear, credible, balanced, ¯exible, and tied to desired results. As obvious as these suggestions may be, failure to incorporate each of these characteristics has often undermined the effectiveness of new reforms. 2.1. Clarity Since measures become an important part of the communications process, they should be clear and easily understood. The challenge of achieving effective communication is great, since knowledge must be translated between outside overseers with only a general knowledge of agencies' operations and agencies with their own jargon and obtuse way of communicating. Even within agencies, communications between generalist managers and highly specialized employees can be dif®cult. Dumping out pages full of numbers does not contribute much to the effective communication of results. Translating data into visual graphs and charts helps facilitate communication. Simple line graphs often portray performance data most clearly. Trends can be shown year-

321

to-year to meet the needs of outside overseers. They can also be displayed over shorter periods of time to facilitate rapid identi®cation and correction of problems. In either case, the reader can usually grasp easily the direction (or lack of direction) of the data. Good examples of effective trend graphs can be found in the 1999 Annual Report of the US Coast Guard (US Coast Guard, 1999). Clarity is particularly hard to achieve for agencies with very diverse missions. Some large cabinet departments contain programs with limited relationship to each other. Producing clear, coherent plans and reports is dif®cult, despite best intentions. Clearer results measurement can be developed if Congress puts less attention on the plans of cabinet departments that function almost as holding companies and more on the speci®c programs and the assistant secretarial level of management. 2.2. Credibility Performance measures will not achieve much if they are not believable. They must be accurate and perceived to be accurate. Agencies must avoid temptations to exaggerate achievements, if they want outcome measures to play a greater role in external and internal communications. Credible data become particularly valuable during times of agency stress, when they can help put short-term problems into perspective. Is a current problem an aberration, or does it re¯ect an ongoing trend? Good numbers can provide some answers. But during times of controversy, the statistics themselves may be challenged. Performance measures need to withstand tough scrutiny, especially during periods of ®erce public debate. Because of the nature of EIA's work, a sizable portion of its employees are highly trained statisticians experienced at developing data systems that need to be credible to all sides during often contentious debates about energy. As a result, the development of credible performance measurement systems was easier than at agencies with fewer or no statisticians. All agencies should obtain the involvement of professional statisticians to improve the credibility of their reporting systems. 2.3. Balance The use of unbalanced metrics has skewed the incentive system in some agencies. Archival agencies need to balance preservation, accessibility, and privacy. Regulatory or tax collection agencies need to balance measures on enforcement with others on customer attitudes. Data agencies need to balance accuracy and timeliness. When only one side of the equation gets emphasis, dysfunctional results can occur. Measures should not be a response to the crisis of the day. Good multiple measures balance a variety of often-con¯icting concerns.

322

J.E. Hakes / Evaluation and Program Planning 24 (2001) 319±327

2.4. Flexibility Emphasis on performance measures can lead to attempts to game the system. If people are simply trying to make the numbers look good without committing to overall improvements in results, performance measures will become a largely useless exercise instead of a major contributor to better results. As a result, employees being measured need to be involved in developing the measures and committed to good customer service. Also, when measures are not producing what is needed, because of gaming or any other reason, there should be minimal barriers to changing them. At EIA, a crosscutting team of employees from throughout the agency developed and scrutinized EIA-wide measures. One of the team's goals was to identify and then modify or eliminate measures that had a high potential for gaming. 2.5. Relevance Reporting too much data can produce information overload. As a result, it is important to decide what data should be reported and what should not. Some managers want the Information Technology of®ce or the planning of®ce to tell them what data should be reported. However, managers should make these decisions for themselves. The interests of top-level managers may differ from those at the mid-level. At the highest level, managers must focus on public impacts and the results of interest to outside overseers. These data will normally track issues identi®ed in the agency's strategic plan. At a lower level, operational indicators take on greater signi®cance. However, everyone in the organization needs a general awareness of the measures on outputs and even top managers may need to see operational measures if they are producing unusual results (e.g. numbers that are rapidly rising or falling). 3. Implementing measurement systems Although federal agencies have complied with the requirements of the Government Performance and Results to submit annual reports, the General Accounting Of®ce, congressional committees, and other commentators have expressed disappointment in the extent to which the vision of the Act is being achieved (Landauer, 2000). The Act describes the kind of report required, but does not provide much guidance on how to produce it at agencies with very diverse missions and circumstances. As a consequence, federal managers have to devise their own successful implementation strategies. It is widely recognized that agencies must coordinate well with users and with other governmental power centers, like the Congress, to implement successful measurement system (General Accounting Of®ce, 1997). Agencies can enhance their understandings of missions and customer requirements through healthy dialogues with outside groups. If the issues surrounding measuring results becomes highly politicized,

however, constructive dialogue becomes more dif®cult. Opportunities to gain consensus on vital missions are lost. Federal managers caught in partisan cross®res have few options, other than attempting to repair bridges to important outside customers. The Energy Information Administration's position as a nonpolicy statistical agency facilitated its communications with outside audiences, since its status made it relatively easy to work across the branches of government and across party lines. Less recognized than the importance of external communication is the need to develop successful implementation strategies within agencies. For instance, like any cultural change, there needs to be a motivational foundation on which to build results measurement. Some advantage in dealing with the budget process that resulted from development of such systems might provide strong incentives. However, the reality of such an advantage is suf®ciently unclear at this point to limit this prospect as a motivational tool. Agencies who want to develop strong systems to measure results should base them on some broad commitment to customer service and civic mission. At the Energy Information Administration, seven employees were sent to professional training in customer satisfaction in 1994. The returnees then trained most of the remaining EIA employees and served as `customer advocates' on many agency committees. The attention to customer concerns created a positive environment for increasing the focus on results. In addition, my monthly column in the agency newsletter discussed the concepts of civic mission developed by Robert Denhard (1992) in The Pursuit of Signi®cance. Employees were encouraged to discuss how their work bene®ted the public. When systems of measurement focus on meeting the needs of customers and public responsibilities rather than the whims of senior management, momentum need not be lost turning the inevitable turnovers in political leadership. If results measurement is to represent a commitment to achieve improved performance rather than just a new set of rules, managers will need to deal with the fear of employees that measurement will provide new opportunities to assign blame and punishment. At EIA, strong emphasis was put on positive results and written commitments were made to avoid using measures to play `gotcha'. Agencies should also establish, in consultation with outside overseers, reasonable expectations about what results should and can be measured. Not every ultimate result can be measured, at least not at a cost that is reasonable. On the other hand, some agencies are too reluctant to measure results, because they do not control all the factors that affect success. Most agencies needed to move further up on the results ladder, but it is also possible to go too far in attempts to measure the fundamentally unmeasurable. (The author discusses the problems of identifying reasonable results to measure in Hakes, 1996, pp. 1, 2, 9.)

J.E. Hakes / Evaluation and Program Planning 24 (2001) 319±327

323

Fig. 1. EIA web daily user sessions.

3.1. Case one: measuring usage of the EIA web site Three graphs from the Energy Information Administration illustrate two case studies demonstrating some of the points above. These stories also reveal some of the learning that occurred as measurement systems were implemented. The ®rst case involves capturing data on the use of EIA's web site. This information provided important feedback on the most dynamic aspect of customer behavior during the 1990s. The numbers on usage of the web site can be displayed by month or year using either bar or line graphs. In such form, the ®gures would show the high cumulative use of the site (6.4 million user sessions in its ®rst 5 years) and a pattern of dramatic growth. Data on customer usage may not provide de®nitive information on ultimate outcomes (e.g. the bene®t of the information being used), but usage is, nonetheless, something beyond an output (e.g. putting the information on the web). In other words, usage of EIA of products and services can be considered a result, though not an ultimate result. In Fig. 1, each dot represents a day's number of user sessions. In comparison to line or bar graphs, the added detail makes the presentation somewhat more complex and dif®cult to discern at low resolution. Still, it incorporates the broad trend pattern in the monthly or annual displays and adds valuable insight on seasonality and short-term factors affecting use of the site. This graph on web site usage illustrates several issues discussed in this paper. First, the data did, in fact, allow EIA to brag effectively about its accomplishments in making information more accessible to the public. Positive letters and comments were also helpful in showing the popularity of EIA's web products. However, any unquanti®ed claim the web was playing an expanding role in make information available paled in comparison to those based on systematic data. At EIA, graphs on web site usage were often included in public speeches, brie®ngs of congressional staff, award applications, and press interviews to document and publicize the achievements of the agency.

There was no other way of demonstrating so convincingly that large numbers of people were using EIA's products and services. In 1993, it was dif®cult to demonstrate impacts nationwide impacts of a statistical agency. Lists of subscribers to print publications indicated some broad interest in energy statistics. However, the number of subscribers was relatively small, and some might raise questions of whether subscribers (many of whom received their publications without cost) were actually readers. The combination of a strong web site with data documenting its usage created new opportunities to demonstrate national impact. Second, in aggregate form, these data demonstrated to agency employees high levels of customer interest, thereby raising assessments of the agency's signi®cance to the public. In disaggregated, product-by-product form, richer detail in usage data gave individual employees and work groups updates on how much and even what parts of their products were being opened. The feedback from customer data supplied good motivation to upgrade and update web products. Third, in daily form, the data provided rapid feedback on any dramatic changes in customer usage. The data indicated, for instance, when the server went down (rarely) and when public focus on energy issues produced sudden surges of web visits. The impact of events such as price spikes, press conferences, or new product releases could be readily assessed using the daily data. For instance, 35,000 users visited the site on one day during run up in prices for heating oil and other distillate fuels in February of 2000. Fourth, the data provided important feedback for improving speci®c products and services. If a product had less usage than expected, the data often stimulated new efforts to make the product more `web friendly'. Low usage products despite good web designs were deemed of lesser interest to customers and updated less frequently than high usage items. For example, EIA prepared Country Analysis Briefs on many countries around the world that produced or consumed the greatest amounts of energy. These summaries

324

J.E. Hakes / Evaluation and Program Planning 24 (2001) 319±327

were widely used throughout the world after they became available over the web. Once detailed usage measures on individual products became available to EIA authors, these data affected the order in which ®les for particular countries were updated. In addition, the documented popularity of summaries on the site encouraged more development of similar products. Fifth, to maintain credibility, user sessions rather than hits were used as the focus of numbers presented to outside overseers, since the latter number can provide an in¯ated impression of usage. For the same reason, usage by EIA's own employees was screened out. For a rationale of EIA's system for measuring the usage of its web site, see Bradsher-Frederick and Rutchik (1996). These issues, which might be considered somewhat esoteric, were raised by outside groups, including one time I testi®ed before a congressional committee. The fact these issues had been dealt with increased con®dence in the numbers and protected agency credibility. Sixth, collecting the data could be automated and had low costs. Once the appropriate software was set up, even daily reports on individual products could be made available without any special human intervention. These data eventually became available to EIA employees on the agency's Intranet. The ease with which the data could be assembled and disseminated offset most of the excuses for not collecting them. As valuable as web site usage data were to the organization, it was still important to recognize their limitations as evaluation tools. The data did not re¯ect all usage of EIA's products and services. They did not indicate the identity of web site users nor the purpose or value of their visits. In addition, too narrow a focus on usage data might have provided the wrong incentives for EIA employees. These limitations were overcome by collecting other forms of customer information and keeping the data in the context of overall agency mission and strategies. For these reasons, EIA analyzed data on the length of user visits to the web site to ascertain whether such usage was brief or extended. It also collected a substantial amount of customer data other than web site usage. It monitored trends in calls to the National Energy Information Center, citations in major newspapers, and requests to testify before congressional committees. Web site usage probably became the most important measure of the organization's value to customers, but the package of usage measures was stronger than any single measure. EIA also conducted periodic sample surveys of telephone and web customers to assess the extent to which customers were `satis®ed' and `very satis®ed' with various aspects of EIA's services and products. The surveys also asked for customer views on issues important to EIA's planning, such as whether the availability of web products would decrease users' need for EIA to produce paper products. These surveys, administered by EIA employees, provided valuable feedback to the agency. The lack of outside inter-

viewers and a common, government-wide survey instrument, however, limited EIA's ability to compare its responses from customers to those for other agencies. Recently, EIA began participating in the American Customer Satisfaction Index, which will greatly increase the potential for comparison with other agencies and even private companies. For a review of the use of this Index by federal agencies, see http://www.customersurvey.gov/ measure.htm. Qualitative information also added important perspective on customers. For instance, data on web users did not distinguish between different kinds of customers. As a result, the record of a user session did not tell if it came from someone with a major impact on public policy, say a major staff member at the White House, a congressional committee, or the Secretary's Of®ce at the Department of Energy. Such information would obviously be very helpful. In fact, such usage of the EIA web site at high levels was frequent and often came to my attention during discussions on other matters. It was important to weave such qualitative information into the story told by the hard numbers. Out-of-context data on the amount of web usage could provide skewed incentives to employees. The competition to produce more web users, generally a positive incentive, could encourage undesirable mission creep. Products not central to the agencies' legislative mandates might be put on the web to attract heavy usage. Thus, the data might produce a false sense of what constitutes success. Like all measures, data on web site usage needed to be balanced within the broader goals of the strategic plan. I saw little evidence of such mission creep at EIA, but this possibility must be considered. Measuring web site usage will probably be important for all federal agencies who deal with the public, but each agency will need to develop a set of customer measures that are best suited for its particular missions and strategic priorities. 3.2. Case two: monitoring EIA data quality A second case of performance data helping agency effectiveness concerns EIA's attempts to deal with the quality of its energy dataÐan important issue for most of its customers. In the parlance of The Balanced Scorecard (Kaplan & Norton, 1996), customer usage and satisfaction are `lagging indicators' of organizational success. That is, the quality of products and services could rise or fall before the data would pick up customer responses to these changes. As a result, measurement efforts must also address `leading indicators' re¯ecting the actual quality of products and services. Outside overseers have been right in insisting that agencies give more attention to outcomes, but if insuf®cient attention is given to leading indicators, improved outcomes may be imperiled. Figs. 2 and 3 show levels of nonreporting of gasoline stocks by petroleum companies. Energy analysts used

J.E. Hakes / Evaluation and Program Planning 24 (2001) 319±327

325

Fig. 2. Percent of motor gasoline inventory imputed on weekly surveys 5 January 1996±17 December 1999.

weekly reports from EIA on inventories of crude oil and petroleum products, of which gasoline stocks were a part, to assess the status of petroleum markets. In this and similar cases, nonreporting occurred when companies failed to submit required data on weekly surveys about their stocks

of crude oil and petroleum products. During periods of mergers, acquisitions, and downsizing in the oil industry, which were common in 1999, many companies weakened their infrastructures for reporting data to the federal government. In cases of nonreporting, EIA statisticians used

Fig. 3. Percent of motor gasoline inventory imputed on weekly surveys 2 July 1999±17 December 1999.

326

J.E. Hakes / Evaluation and Program Planning 24 (2001) 319±327

imputation to calculate totals. Although imputation is a useful tool, EIA faced the prospect that lower response rates would make weekly data less reliable. Not surprisingly, data quality was identi®ed in EIA surveys as a primary concern of data users, although many of them reported they had only limited ability to make this assessment. Any erosion in the quality and credibility of of®cial energy data would clearly have serious adverse affects on EIA's outputs and outcomes. For apparently unrelated reasons, reporting problems increased in the Spring of 1999, about the same time the Organization of Oil Exporting Countries (OPEC) successfully cut back its crude oil production in an attempt to reduce global surplus inventories and to turn around record low prices. By the end year and largely as a result of the OPEC actions, world inventories had shrunk to worrisome levels and higher prices were drawing increased attention from consumers and elected of®cials. Low oil inventories and high oil prices in late 1999 and into 2000 put the spotlight on oil stocks data from EIA and the International Energy Agency. From February through September of 2000, other EIA of®cials and I testi®ed nine times before four different congressional committees on problems in world oil markets. In February, I, along with the Secretary of Energy and other senior US of®cials, delivered brie®ngs on world oil stocks using EIA data to senior government of®cials in Kuwait and Saudi Arabia. EIA's senior petroleum analyst presented similar brie®ngs to of®cials in other oil exporting countries. Because of the widely divergent views of what was going on in world oil markets, the credibility of the EIA stocks data assumed critical importance in many of the policy discussions. Fig. 2 provided trend data on imputation (i.e. nonreporting) of weekly US gasoline stocks for the period 1996± 1999. The information was presented in a clear line graph that helped mid and upper-level managers easily assess the magnitude of the nonreporting problem and the direction of the trend. These data became an important part of the communications process within the agency. This is not a trivial consideration, since technical issues of potential importance often do not get adequately aired at all levels of an agency. I share with many federal managers the inability to always grasp immediately the full signi®cance of problems raised by staff. In my experience, discussions that must rely on verbal descriptions or tables of numbers are rarely as effective as those which utilize line graphs showing major trends. Professional statisticians take much more than reporting levels into account when assessing data quality. However, this simple measure of nonresponse rates provides important information in a format easily understood. In this case, the evidence of a decline in reporting (equivalent to an increase in nonreporting or imputation) stimulated discussions about strategies to reverse the trend. EIA decided on several approaches to the problem, most notably elevating written warnings about nonreporting

to higher levels of unresponsive companies more quickly. This strategy was based on assurances from senior executives at major oil companies they would act quickly if problems were brought to their attention and replaced the previous practice of continuing to focus on more junior employees actually submitting the reports. The graph shows reporting in the last quarter of 1999 generally returning to levels approaching those before the dip. As a result, nonreporting problems were largely resolved before the hearings and brie®ngs of 2000. The line graph in Fig. 3 covered a shorter period of time, so trends in the data could be linked to speci®c interventions. It displayed the extent and timing of one of the attempted ®xes (writing letters to high-level company of®cials). The waves of letters shown in the numbers next to the lines, particularly the ®rst batch, appear to have had a positive impact on reporting levels. The experiences re¯ected in Figs. 2 and 3 improved the agency's con®dence it could use data to communicate problems recognized at the staff level to higher management. The same data could be used to tell whether corrective actions were working and to establish linkage between corrective actions and positive results. It ultimately helped avoid potential credibility problems with congressional committees and other important users of EIA information. Providing credible data in numerous, high-pressure situations during the oil price shock helped EIA's Petroleum Division receive an appropriation for FY2001 at a higher level than the agency request. This measure would not have achieved prominence if EIA did not have ¯exibility in its selection of performance measures. The Performance Measures Committee regularly reviewed current measures. Based on this analysis, EIA dropped measures that did not add value and added measures tied to critical agency missions. The agency encouraged of®ces to develop measures relevant to their particular needs. In the case of data quality, it also worked to develop more technical measures of data quality that were less suitable for a lay audience, but augmented the measures on nonreporting shown in this paper. Like the data on web users, these measures have limitations. An obvious problem is the large number of EIA surveys and the possibility of being overwhelmed with measures of survey response. To avoid information overload, I made no attempt to closely monitor response rates for all EIA surveys. In fact, there was little reason to do so, since many surveys had little variation in levels of reporting. However, it is important to rely on the judgment of others in the organization or automatic triggers when a change of a certain magnitude is recorded to ensure that important information becomes available in a timely way. No such trigger mechanisms were established at EIA, but in retrospect doing so would have been a positive step. EIA also surveyed its own employees about, among other things, their perceptions of the quality of EIA's work. Employees are likely to observe positive or negative trends

J.E. Hakes / Evaluation and Program Planning 24 (2001) 319±327

in data quality before customers. As a result, trends in employee responses to questions on quality are another leading indicator on issues that ultimately affect customers. When the GPRA mandates were ®rst being implemented, the greatest emphasis was placed on customer measures. This approach seemed consistent with the major tenets of GPRA. It also was appropriate for EIA's strategic plan, which was based on the idea that EIA was producing high quality data, but not enough people were using them. However, in the late 1990s, restructuring of many energy industries placed considerable stress on several of EIA's data series. Moreover, with a greater number of customers and wild swings in energy prices, these data series were receiving closer scrutiny. As a result, data quality measures, such as the response rates to surveys on gasoline stocks, grew in strategic importance. 4. Conclusion During the initial period of implementing the Governmental Performance and Results Act, the experience of the Energy Information Administration demonstrated that expanding the role of performance measures could enhance the ability of a public agency to, at a minimum, better explain its work to outside overseers and, even better, improve outputs and outcomes for the public. With some effort, many agencies showed they could increase focus on results and quantify many (though not all) important aspects of performance that previously lacked good measures. Good measures improved internal and external communication. When the data fed back into operations, they promoted continuous learning and improvement. At EIA, development of comprehensive and effective measurement systems was a long, evolutionary process. In the early days, it was necessary to overcome skepticism about yet another reform mandates from afar. Also, it was important to begin with a `defensive' strategy that focussed on avoiding measures that were irrelevant, ridden with loopholes, or even dysfunctional. As the effort matured, it became important to recognize that external environments change. Effective organizations must have the capability to amend their strategic priorities, and consequently what measures they want to emphasize, from time to time. Because of the size of the challenge, outside overseers should insist on improved measurement systems but not

327

expect the results to always be immediate. Measurement systems also need to strike a balance between encouraging standardized measurement tools, such as the American Customer Satisfaction Index, and allowing agencies the ¯exibility needed to establish and adapt strategic priorities. As the cases at EIA suggest, developing good metrics is both possible over time and well worth the effort required. Early successes in measurement should be well recognized, since they can become the building blocks for later successes in other areas. In the EIA cases, it was also important to publicize the performance data externally and internally and to use the feedback provided to improve operations. The obstacles to better measurement systems are many, however. As a result, broad achievement of the Act's vision is still in doubt. Nonetheless, it has become impossible to imagine any reform movement succeeding, if government agencies cannot put in place effective measurement systems to assess whether new strategies of any kind are actually producing results. References Bradsher-Fredrick, H., & Rutchik R. (1996). Maximizing feedback to increase customer use of the energy information administration world wide web site. Paper presented at the Annual Conference of the American Statistical Association, Chicago, IL. Denhard, R. B. (1992). The pursuit of signi®cance: strategies for managerial success in public organizations, Belmont, California: Wadsworth Publishing Company. Energy Information Administration, US Department of Energy (1999). 1999 Annual Report to Congress at http://www.eia.doe.gov/pub/pdf/ other.docs/017398.pdf. Energy Information Administration, US Department of Energy (2000). EIA Strategic Plan: 2000±2005 at ftp://ftp.eia.doe.gov/pub/pdf/other.docs/ s_pln00.pdf. General Accounting Of®ce (1997). Managing for results: Enhancing the usefulness of GPRA consultations between the executive branch and the congress. Testimony of L. Nye Stevens between the Subcommittee on Management, Information and Technology, US House of Representatives, March 10, 1997 (GAO/T-GGD-97-56). Hakes, J. E. (1996). Comparing outputs to outcomes: making sense of what we do. PA Times, 19, 10. Kaplan, R. S., & Norton, D. P. (1996). The balanced scorecard: Translating strategy into action, Boston, MA: Harvard Business School Press. Landauer, B. (2000). Agencies fail to measure up in delivering results. Federal Times, August 28. Newcomer, K. E., & Wright, R. F. (1997). Effective use of performance measures at the federal level. PA Times Supplement, January. US Coast Guard (1999). 1999 Annual Report of the US Coast Guard.