Journal of Government Information, Vol. 24, No. 3, pp. 1X3-159,1997 Copyright 0 lYY7 Elsevier Science Ltd Printed in the USA. All rights reserved 13S2-0237/97$17.00 + .OO
Pergamon
PI1 S1352-0237(W)OOO17-8
MAKING STATE AGENCY DATA MORE WIDELY USEFUL: THE IOWA POLICY AND PLANNING DATA PROJECT” GREGORY 204TS Parks Library, Iowa State University,
WOOL**
Ames, Iowa 50011, USA; Internet:
[email protected]
Abstract-This paper describes the scope, historical development, and accomplishments of the Iowa Policy and Planning Data Project (IPPDP). Inspired both by a mandate for state agencies to share their data resources for greater efficiency and by the research needs of economic development planners, the IPPDP is a cooperative effort of several Iowa government departments and units of Iowa State University. Its original long-term goal, the creation of a data dictionary, was soon set aside; however, substantial progress has been made toward its short-term goal, the development of a prototype online catalog of data files held in state agencies and offices, despite such obstacles as the limited development of cataloging standards and the reluctance of many agencies to commit the time necessary to document their holdings. More recent Project initiatives include a series of workshops to help policy analysts and planners locate relevant information on the Internet and the development of an online information server to make state-collected data readily available to community planners. The IPPDP represents a “virtual library” approach to making government data resources-many of them unpublished-more widely available. 0 1997 Elsevier Science Ltd Keywords-Cataloging, Data dissemination
Numeric data files, Iowa state government,
BACKGROUND
AND
Data management,
ORIGINS
One of the most significant activities of state government in Iowa is data gathering. Every agency needs information in order to carry out its mission, and collects this information through surveys, tax records, licensing applications, site visits, and other means. As a result, substantial collections of data about Iowa and Iowans are housed in various state offices, and offer a rich mine of information about public health, local government revenue, employment, public education, land use, and other topics. Except where personal privacy is an issue, most of this information is publicly available. While these data files have great potential research value for a wide variety of people-from college faculty and students to small-town economic development planners-few people know that they exist, let alone how to gain access to them. For that matter, much of the information one agency collects can be useful to policy planners in another agency, but often agencies have no way of knowing what is available and collect the same data a second time (not, most would agree, the best use of taxpayers’ dollars). *The author wishes to thank Mark Imerman, Dilys Morris, and the editorial reviewers of this article for their comments and suggestions. **Gregory Wool is an assistant professor and Monographs/Special Projects cataloger at the Iowa State University library. He has served as project cataloger for the Iowa Policy and Planning Data Project since its inception. 153
154
G. WOOL
Most Iowa government data files are now in machine-readable form. Other computerized files of data from the federal government (e.g., U.S. Census Bureau tapes) and various academic research projects (e.g., studies archived by the Interuniversity Consortium for Political and Social Research, or ICPSR) have been acquired by academic departments and research units at Iowa State University (ISU) as well as by some state agencies; information about these files, too, has not been readily available. Due to numerous factors including fiscal constraints and an already full agenda of changes in the information environment to be managed, libraries in Iowa have generally been slow to provide access to government data files. State agencies, for their part, usually lack incentives and expertise to do more than respond to the occasional request for their own data. By the dawn of the 1990s some researchers were beginning to see the need for some form of centralized access to machine-readable data files owned by the State of Iowa. This perceived need led, after a couple of false starts, to the creation of the Iowa Policy and Planning Data Project (IPPDP). The IPPDP is essentially an outgrowth of Iowa State University’s Rural Data Project, itself a major user, repackager, and reference mediator of state-produced data. However, IPPDP also counts among its members a consortium of nine executive departments of the state of Iowa (Transportation, Employment Services, Education, Economic Development, Revenue and Finance, Public Health, Natural Resources, Management, and Cultural Affairs) plus ISU’s Agricultural Experiment Station and Extension to Communities. All parties agreed to provide either funding, analytical expertise, or information about data holdings to the Project. Other than some outside grants, the Iowa Department of Transportation and Iowa State University have, from the beginning, provided major funding for the IPPDP. The IPPDP was launched in 1992 with the stated goal “to enhance access to electronic data files for policy makers, planners, and researchers” [l]. The initial project description more precisely defines this goal first, by excluding data files perceived to be of strictly internal use from the Project’s scope; and second, by defining four levels of data access: location, determination of quality and accuracy, data-item definition (data dictionaries), and direct retrieval. While initial project objectives included the development of mechanisms for determining the quality and corporate value of datasets, as well as a common data dictionary format, the provision of desktop access to datasets was left out of the Project’s scope as being a technical, rather than a management problem [2]. Cooperative data management was the purpose of the IPPDP; identifying and locating datasets and providing basic bibliographic information about them to potential users were tasks considered fundamental to facilitating data use. Developing a catalog was seen as a necessary first step. Indeed, it became the major undertaking of the IPPDP’s first three years. Shortly after the Project began, the dataevaluation and data-dictionary objectives were set aside as being beyond the resources then available. During the course of the Project, however, other needs and opportunities arose that led to two additional IPPDP ventures: a series of workshops on access to electronic data sources for planners and administrators, and a World Wide Web server. The development of the catalog remains, for now, the most distinctive aspect of the IPPDP, and is thus the focus of attention in this article [3]; the newer ventures will be discussed briefly as well. CATALOG
DEVELOPMENT
For building a catalog, the Project coordinator (who is also coordinator of the Rural Data Project) turned to a ready source of expertise, the ISU library. An agreement with
The Iowa Policy and Planning Data Project
155
the library committed a portion of three staff members’ time to the Project in return for funding of the appropriate proportions of salaries and benefits. As a result, a professional library cataloger became the IPPDP cataloger on a half-time (later, quartertime) basis, the online catalog support specialist provided technical support, and the assistant director for Technical Services joined the IPPDP Board of Directors. A search for projects with a similar scope and purpose on which to model IPPDP work turned up only two possibilities: the Florida Growth Management Data Network and a proposed Directory of Electronic Information Resources for the University of California system [4]. The former used the Federal Data Transfer Standards to create a virtual library of geographic information system (GIS) data held by various government bodies or agencies in Florida, while the latter was to be a catalog of electronic data files owned by any of the University of California campuses and not necessarily held in a library. Since the Florida project appeared to be working outside of library cataloging standards and since the California proposal had yet to receive funding, the IPPDP proceeded largely on its own. About a year and a half later, IPPDP staff became aware of the federal project to develop a government information locator service (GILS), and they have continued to monitor its progress. The GILS project is remarkably similar to IPPDP in its aims (intellectual access to agency information resources for other agencies as well as the general public) and approach (a metadata catalog, with information supplied by the agencies under a common standard). There are differences as well. While GILS encompasses all kinds of information resources, the IPPDP’s focus is on numeric data files. GILS has developed its own metadata standard and supports a variety of record formats, while IPPDP relies on the existing standards used in libraries. Finally, the GILS approach, backed up by legislation and executive mandate, places more responsibility on the individual agencies for record creation and maintenance than is feasible in IPPDP [5]. Basic Assumptions
At the outset, decisions about the catalog were made. (1) The catalog would be online as a specialized database within ISU’s NOTIS sysThis enabled the Project to take advantage of an already available software platform with a built-in access mechanism. The advantage to the Library was that the catalog would provide ISU faculty and students with a unique service by informing them about the data resources of Iowa state government. The major disadvantage was that since Internet access was unavailable to most state agency employees (at least before 1995) they could consult the catalog only by dialing in with a modem, a method sufficiently inconvenient and expensive to discourage casual use and exploration. The catalog was developed in the test region of the NOTIS system and was thus shielded from casual public access until January 1995, when it was added to the menu of ISU’s online catalog gateway. The Project cataloger took advantage of NOTIS’s record display customization features to seek configurations especially useful for numeric data files and their users. tem.
(2) Prevailing library cataloging standards-notably followed as far as possible. Use of MARC (MAchine
AA CR2 and MARC-would
be
Readable Cataloging, a family of standards for communicating bibliographic data) made a NOTIS-based catalog possible. Adherence to AACR2 (Anglo-American Cataloging Rules, second edition) meant that the Project could use cataloging copy from OCLC (the world’s largest cooperative supplier of
156
G. WOOL
bibliographic records) where appropriate. Using Library of Congress Subject Headings (LCSH), another standard for most academic libraries, provided structured subject access without a process of reinvention. In general, use of prevailing standards made possible both the best use of existing staff skills, and the sharing of Project records with the wider library community. It was expected that the Project cataloger would use his experience with IPPDP to contribute to the ongoing development of cataloging standards for computer files. (3) Records would be based on information submitted by the data custodians. As many of the datasets to be cataloged were stored on magnetic tape for loading onto a mainframe computer and were held at sites remote from Ames (where ISU is located), it was considered practical to have the data custodians provide cataloging data for their files by filling out a questionnaire. The questionnaire was designed by the Project coordinator (an experienced user of numeric data files) with input from the cataloger. The intent was to capture the various types of information users of the catalog would want to see, while easing, where possible, the task of completing the questionnaire. For instance, instead of asking for a summary of the contents, several narrower questions were designed regarding categories of variables, strengths and weaknesses of the data set, and mandate or purpose of data collection, the answers to which were then combined (along with miscellaneous comments) into a “summary” for the catalog record. Cataloging Accomplishments and Obstacles As of June 1,1996, the catalog (called at first simply the IPPDP catalog, later named IowaBase, then renamed Iowa Planning Data Catalog or IPDC) contained records for 424 items. Of these, 115 represent original cataloging of datasets held (and in most cases, compiled) by various state agencies and the Rural Data Project. (Eighty of the 11.5 resulted from a cooperative venture to create an internal catalog for the state’s Department of Public Health.) An additional 257 represent all the cataloging copy found on OCLC for ICPSR files held by the ISU Department of Political Science; these records were edited extensively for consistency and, in some cases, to add needed information. The remaining records, created or added since October 1995, describe 52 Internet sites identified by Project staff as significant data sources for planning and policy research. During the summer of 1993, the IPPDP conducted an evaluation of the catalog. Project staff identified 50 representatives of participating agencies. Each staff member was sent a set of access instructions, a script for a brief introductory tour, and a questionnaire. Twenty-four responses were received. After the results were compiled, the Project cataloger conducted followup interviews with five of the respondents in order to gain more in-depth information. The feedback received was generally positive and indicated a widespread interest in the success of the Project. It also included several ideas for improving the content and arrangement of the record displays, as well as concerns about access and navigation difficulties on the part of dial-up users. In response to these suggestions and concerns, Project staff made a number of improvements to the catalog interface. The cataloging phase of the IPPDP represents an early attempt to adapt a standard bibliographic organization model to remotely held electronic resources. Within certain limits, the model has worked well. AACR2, MARC, and LCSH have provided a sturdy framework for resource description and indexing, even though some liberties had to be taken with MARC field definitions in order to provide distinctive labeling and placement for certain kinds of note-field information. One of these modifications involved
The Iowa Policy and Planning Data Project
1.57
adapting the 535 field (Location of Originals/Duplicates) to hold the contact information for access to the remotely held file-the record element making IPDC the catalog of a “virtual collection”-and facilitate its labeling and positioning within a record display. The nature of machine-readable numeric data files, which normally lack anything analogous to a title page and are often stored on reel-to-reel tape, precludes basing their description upon direct examination. AACR2’s exemption of remotely held resources from the normal requirement for physical description minimizes one aspect of this problem for the IPPDP. For the rest, having to rely on the custodians of data files as sources of bibliographic and indexing information means that information is only as good as each custodian’s understanding of and commitment to meeting the reference needs of users. Strong working relationships, therefore, are necessary between the cataloger and the various data custodians, but committing adequate time and effort to such relationships is a challenge for all parties when day-to-day job concerns seem more pressing. The mechanics of record creation were quite manageable. Items on the questionnaires filled out by data custodians were mapped relatively easily to MARC fields; with a little more effort, entries in publishers’ catalogs (for the ICPSR and two local GIS projects) could be similarly deconstructed and reassembled. The IPPDP cataloger developed standard interpretation and inputting procedures for these source materials that a student worker was able to use successfully. This means that when (and if) the IPPDP receives direct, ongoing funding from the state of Iowa, the cataloging process should be secure in the hands of a member of the support staff, with professional supervision, policy coordination, and collection development on a part-time basis. Unfortunately, very few records, aside from those for ICPSR files, have been created since January 1994. Cataloging of GIS files at ISU and the Iowa Department of Natural Resources has been on hold while the cataloger evaluates the usefulness of cataloging files already indexed on the World Wide Web and while existing records await necessary revisions for contribution to OCLC. At the same time, several participating agencies have been unable to commit the time and resources necessary to document their publicly available data for the IPPDP.
RECENT
PROJECT
INITIATIVES
Workshops on Electronic Data Access
The slowdown in agency participation has been attributed in part (and somewhat paradoxically) to an unfamiliarity with electronic data access on the part of many agency professionals, especially those with the budget authority to bring Internet access to their agencies. The IPPDP moved to address this problem by presenting a series of workshops for agency administrators. These workshops, consisting of an introductory tour and a review of options and costs for connection to the Internet, were conducted at ISU’s Parks Library by staff from the library’s Automated Systems Division. A total of 60 administrators received instruction in four workshops held in November 1994 and January 1995; requests for additional workshops could not be accommodated due to scheduling problems in the library [6]. The success of the workshops encouraged Project staff to organize a one-day conference entitled “Information, Communication Technology, and Community Change,” scheduled for June 1995 with several nationally known presenters. Insufficient advance registrations, however, led to cancellation of this event [7]. Nevertheless, interest in Internet training remains high, and the Project organized another series of workshops for
158
G. WOOL
winter and spring of 1995/96, this time presenting subject and application-oriented views of Internet resources for state policy analysts and administrators [S]. Iowa PROfiles (WWW server)
Iowa PROfiles (Public Resources Online) is the result of a grant awarded in October 1994 to the IPPDP by the U.S. Department of Commerce through its National Telecommunications and Information Infrastructure Assistance Program (NTIIAP). This prototype World Wide Web information server was developed to assist the reference and data-delivery activities of the IPPDP by offering quick access to frequently requested data. It is also meant to address the original expectations of IPPDP for a datadelivery mechanism. The centerpiece of Iowa PROfiles is a hypertext interface to summary statistics for each Iowa’s 99 counties (and eventually, its towns and cities as well). Included are demographic data from the U.S. Census and economic data from the U.S. Bureau of Economic Analysis, the Iowa Department of Revenue and Finance, and the Iowa Department of Management. Other state-produced information sources (such as the Iowa Film Office’s directory of resources for movie and TV production in Iowa) are also available here, along with links to other relevant WWW sites [9]. Iowa PROfiles is designed to use the latest information technology to offer the public quick access to a wide variety of data, while maintaining agency control over its own data. It is also being further developed to provide instruction and assistance in data analysis. However, it is essentially a ready-reference tool, not the ultimate solution to state data-management needs. (Unlike some states, Iowa does not have a centralized data center for state government.) The need to mirror databases for quick access limits the scalability of a single server. Because of this, Project staff believe that comprehensive data-management and in-depth access needs will be best served by a robust catalog with direct links to dataset documentation and access services. Examples of this approach include the National Engineering Education Delivery System (NEEDS) and the Internet Catalog being developed at OCLC [lo]. CONCLUSIONS During the past four years, the Iowa Policy and Planning Data Project has been building a virtual library of state-owned data resources for planning and policy analysis. The term virtual library, as conventionally understood, applies only to Iowa PROfiles, as an organized repository of machine-readable documents and hypertext links to networked resources. A dictionary definition, however, of virtual-“being such in power, force, or effect, though not expressly such” [Ill-suggests that any collection of document surrogates, such as a bibliography or a catalog, could be considered the user interface to a virtual library. The power of the Iowa Planning Data Catalog is that it brings together, and makes public, information about hundreds of state-owned datafilesinformation previously unavailable or difficult to find. Thus it “virtually” turns a random, physically scattered array of government databases into an intellectually cohesive collection. The IPPDP was begun in order to address the need for a data-delivery vehicle. Its founders, however, recognized the importance of resource organization as the necessary foundation of data delivery. Thus, somewhat unfashionably, the IPPDP has been pursuing and exploring the bibliographic approach to data management; despite the problems it has encountered, most notably the necessity and difficulty of effective
The Iowa Policy and Planning Data Project
159
teamwork with data providers, its part-time staff has developed a working model and gathered a body of information that will facilitate state-level data management and delivery. Meanwhile, with Iowa PROfiles, the Project has launched a promising experiment in reference data delivery and user assistance. These efforts are complemented and enhanced by the workshops through which the IPPDP is introducing policy planners and administrators to the uses of networked electronic information resources. Through its cataloging of data files, its electronic information workshops, and its World Wide Web server, the IPPDP is addressing, at the prototype level, the three things state government needs to take full advantage of the electronic information resources available to it: management information, education for access, and a directdelivery mechanism. In doing so, it is laying the foundation of an electronic data library for Iowa state government and the people it serves. NOTES 1. “Iowa Policy and Planning Data Project” (Technical Services Division, Iowa State University Library, Ames, 1992, photocopy), 1. 2. “Iowa Policy and Planning Data Project”, 2-3. At that time, network applications such as Gopher and the World Wide Web had not yet come into widespread use. 3. At this writing, the Iowa Planning Data Catalog can be reached as follows: telnet (or tn3270) to scholar.iastate.edu; at the Command prompt type scholar; at the Database Selection prompt type ipdc. 4. See David Stage, “A Multi-Agency Management Structure to Facilitate the Sharing of Geographic Data” (Technical Services Division, Iowa State University Library, Ames, 1991?, photocopy) and Mary Engle and Clifford Lynch, Directory of Electronic Information Resources: A Feasibility Study, Technical Report, no. 4 (Oakland: Division of Library Automation, University of California, Office of the President, 1990). 5. See: The Government Information Locator Service (GILS): Report to the Information Infrastructure Task Force (May 2, 1994) (URL http://www.usgs.gov/gils/giIsdoc.htm). 6. Mark Imerman, “Iowa Policy and Planning Data Project, Fiscal Year 1995 Annual Report” (Technical Services Division, Iowa State University Library, photocopy), 4. 7. Imerman, 4. 8. Imerman, 11. 9. See: Iowa PROfiles (URL http://www.profiles.iastate.edu). 10. See: NEEDS: National Engineering Education Delivery System (URL http://bishop.berkeley.edu/ NAS.html); and InterCAT: A Catalog of Internet Resources (URL http://www.needs.org). 11. Random House Dictionary of the English Language (unabridged ed., 1967). S.V. virtual.