Government F%bIicwiom Review. Vol. 5, No. 4. pp. 507- 510 Pergpmon Press Ltd. 1978. Printed in Great Britain
MACHINE-READABLE DATA FILES OF GOVERNMENT PUBLICATIONS
JUDITH S. ROWE Associate Director for Social Science User Services, Princeton University Computer Center, 87 Prospect Avenue, Princeton. NJ 08540. U.S.A.
BUREAU OF LABOR STATISTICS DATA The Bureau of Labor Statistics is the Federal Government’s principal data gathering agency in
Specialist, NTIS, U.S. Department of Commerce, Springfield, Virginia 22161, Area Code: 703-557-4634. Users vwishing to access a time series data base which includes selected BLS data such as employment status of the population by age and sex, major unemployment indicators, workers on nonagricultural payrolls by industry, productivity and labor cost, and labor turnover, as well as a large number of economic series from other sources, should contact Charlotte Boschan or Ann Wood concerning access to the CITYDATA file. They may be reached at: CITYBASE, F.D.R. Station, Box 5294, N.Y., NY 10022. Many BLS data files may be purchased for use at one’s own computer installation by contacting the U.S. Department of Labor, Bureau of Labor Statistics, Division of Planning and Financial Management, 441 G Street, N.W., Washington, DC 20212. Prices begin at $60.00 per file for straight copies. Exact prices and fuller descriptions of these data products appear in BLS Data Bank Files and StatisticalRoutines (Report 507). The remaining section of this column contain more specific descriptions of the individual data files produced and distributed by BLS.
the field of labor economics. Since 1884 the Bureau has gathered, organized and disseminated data relating to employment and unemployment, productivity, prices, family expenditures, wages, industrial relations, and industrial safety. Most documents librarians are familiar with the various BLS publications: the seven periodicals, the handbooks, the bulletin series and the various regional publications. They are likely to be less familiar with the BLS Data Bank Files. These files include both published and unpublished time series summary data, as well as some cross-sectional microdata, and some matrix data. The BLS Handbook of Methods provides a description of each major BLS program including background and description of the data, data sources and collection methods, sampling and estimating procedures, analysis and presentation, uses and limitations. According to Rudolph C. Mendelsohn, Assistant Commissioner for Systems and Standards, the Bureau of Labor Statistics had recently completed the first stage in building a very large data base of public statistics using the Data Bank Files and an “Information System” that gives ready access to them through on-line computer terminals. The combined data base and system are called LABSTAT for LABor STATistics. Since access to the federal computer center which houses LABSTAT is restricted, the National Technical Information Service (NTIS) is arranging with a commercial computer service to place LABSTAT data and software on machines that may be accessed by other Federal agencies and by other users as well. For more information about direct on-line access to LABSTAT data through remote terminals, contact Stuart Weisman, Product
Time Series Data Files Currently fifteen types of time series data files are available. Each series represents a discrete variable for which observations are available over time. 1. The first of these is the Current Employment and Unemployment Analysis file, data for which are obtained from the Bureau of the Census Current Population Survey. The file includes 1800 monthly time series, many of which begin as early as 1948. Current data are published monthly in 507
508
JUDITH
Employment and Earnings. Nearly 500 revised adjusted time series appear in this publication each February and summary reports appear regularly in the Monthly Labor Review. 2. BLS cooperates with State agencies in collecting monthly mail survey data from a sample of employer units in all nonagricultural activities to produce Industry Employment Hours and Eamings statistics. This series includes average hourly and weekly earnings, average weekly hours, etc., for over 400 industries. Over 1200 national employment series are published monthly in Employment and Earnings and in summary in Monthly Labor Review, and over 2500 are in the data file. Most of these begin in 1958 but some are available from as early as 1909. 3. Similar data are also available for states and major labor areas. Over 18,000 time series are developed with up to 170 industries reported in the larger states and some industry detail at the Cdigit SIC level for recent years. 4. From the quarterly tax reports submitted to State employment security agencies by employers subject to State and Federal unemployment compensation programs information is obtained about monthly employment and quarterly wages and employment contributions. Two files are created, a historical and a dquarter file. The former begins in 1975 and includes national and state summaries for 84 2-digit industries and national summaries for 423 3-digit and 451 4-digit manufacturing industries. The latter provides state summaries of similar data for the most recent six quarters for 84 2-digit, 423 3-digit and 1004 4-digit industries. A total of 26,300 time series are produced. National and state summaries for broad industry divisions and selected 3-digit industry groups appear in the quarterly Employment and Wages. 5. Monthly series indicating the total civilian labor force, total employment, total unemployment and the unemployment rate are available for approximately 6,000 geographic areas including counties and cities with populations of over 25,000. These series begin in 1973 and subsets of the data are published monthly in Employment and Earnings, State and County Employment and Unemployment Rates for State and CETA Area, Employment and Unemployment. Quarterly data appear in Unemployment Rates for States and Identifiable Local Governments. 6. The Industry Labor Turnover program is another Federal-State venture. The 1100 time series in the data file partially duplicate those
S. ROWE
which appear in Employment and Earnings, United States, 1909-. This annual averages for new hires, quits and layoffs in 215 manufacturing industries and seven mining and communications industries, beginning in some cases in 1930 and in others in 1943 and 1958. 7. The Consumer Price Index (CPI) measures average change in the price of a market basket of goods and services bought by urban wage earners and clerical workers, for day-to-day living. Prices of about 400 items are collected monthly, quarterly, semiannually or annually in a 56-area sample. Findings appear in the Consumer Price Index, the CPI Detailed Report, and the Monthly Labor Review and are available in machine-readable form consisting of over 1,200 time series. The 1978 revision updates: (1) the weights assigned to the various spending categories such as food, clothing, shelter and medical care; (2) the sample of items priced each month in the ongoing CPI; (3) the sample of retail stores; (4) modernizes the conceptual bases and statistical methods employed. 8. Data provided by individual producers of manufactured and processed goods supplemented by information from trade publications are used to construct the monthly Wholesale Price Index (WPI). This Index does not match any other standard classification. The WPI time series for 2,700 individual commodities and a number of commodity groupings all go back at least to 1947. Current and summary WPI are published in the monthly and annual Producer Prices and Price Indexes (formerly: Wholesale Prices and Price Zndexes) and in the Monthly Labor Review. 9. The Industry Price Index uses the 1972 Standard Industrial Classification in aggregating WPI data to 5-digit product classes. Time series data files for selected mining and manufacturing industries and for over 500 product groups are also available. Publication of these data usually accompany the WPI. 10. Data for the Export and Import Price Indexes cover 47 percent of the value of U.S. exports and 15 percent of the value of U.S. imports. Although there are plans to extend coverage to all major product categories within the next few years current product data are concentrated in the following categories: machinery and equipment, chemicals, crude materials, and some intermediate products including iron and steel, lumber and paper. Price data are obtained directly from exporters and importers residing in the U.S. and represent transaction prices in the third month of each calendar quarter. Price indexes are
Machine-readable data files of government publications available at the 4digit and 5-digit and higher level aggregates of the Standard International Trade Classification System. The data file contains about a 100 series on a quarterly basis beginning in 1974. Data for the most recent five quarters are published in U.S. Export and Import Price Index. 11. A special file on imports contains hertofore unpublished data. These data are based on the value and quantity of imported commodities classified by the Tariff Schedules of the United States Annotated (TSUSA). Quarterly and annual measures starting in 1968 are available for about 10,000 TSUSA imported commodity classes and values for 618 S-digit and 372 4-digit SIC-based commodity import groups. Annual rates of imports to new supply by 4-digit SIC-based manufactured commodity import group are also available. 12. Two separate files are available, both containing indexes of labor productivity, unit labor cost and related measures: one covering the private economy contains fourteen measures for thirteen industrial classifications; the second covering manufacturing and nonfinancial corporations for four industry sectors - total, durable and non-durable manufacturing, and nonfinancial corporations contain similar levels and indexes. Most of these series are quarterly and begin in 1947. These data appear regularly in the Monthly Labor Review, Employment and Earnings and the annual Handbook of Labor Statistics. . . . 13. Productivity data are also available in two additional files. The file covering private industry contains such measures as output, production worker hours, nonproduction worker hours, etc., for 60 industries. There are approximately 400 annual series, about half beginning in 1947. Data are published in the annual Productivity Indexes for Selected Industries. The file containing data for the federal government contains such measures as output, employee years, compensation, etc., for 28 functional areas. All of the 2% annual time series begin in 1967. Data are published in the Handbook of Labor Statistics. 14. BLS produces a variety of international comparative measures, mainly for western industrial countries. In the course of so doing, numerous small files containing foreign statistics are generated. These files include data on such items as productivity, compensation, unit price, capital investments and industrial disputes. Typically the measures produced are published in the Monthly Labor Review and in the Handbook of Labor Statistics.
509
15. Data related to employment costs CurrentlY includes quarterly straight-time hourly earnings for eight major occupation groups, five major industries, four geographic areas, union/nonunion status, metropolitan/nonmetropolitan, but expansion plans call for inclusion of benefit costs as well and a shift to a monthly index. Published data appear in Current Wage Developments. The series began in December, 1975. Annual Cross-Section Datafile In addition to these time series files a crosssectional annual file is available from BLS. Occupational Safety and Health data are published by the individual states. The BLS data file for 1WS contains data from 44 states and 5 jurisdictions. Aggregations vary from 2-digit to Cdigit manufacturing industries but covering for each industry cell both absolute values and rates for injuries, illnesses, and the total of injuries and illnesses for total cases, total lost work day cases, days of restricted work activity, etc., and at the “all industry” level similar variables for seven separate illnesses categories. Mat& Datqfiles A somewhat different BLS data file product is the Industry-Occupational Matrix which contains data for 260 industries and industry groups and 425 occupations and occupational groups. For each industry-occupational cell, three data items are provided employment level, ratio of employment to occupation total, and ratio of employment to industry total. Matrices are available for 1970, 1974, and 1985 (projected figures). These matrices were published as a part of the series Tomorrow’s Manpower Needs, as Volume IV Revised; The National IndusttyOccupational Matrix and Other Manpower Data (Bulletin 1737). As part of the program for Economic Growth BLS makes available two files based on data for 129 industries: (1) Input -output matrices, final demand components, employment and output for 1963, 1970 and projected for 1980 and 1985; (2) annual employment and output series by industry for 1958 through 1974. These data were published in the Monthly Labor Reviews of March and November 1976 and have been the basis of a number of BLS Bulletins concerned with estimating and projecting economic growth including 1672, 1831 and 1832. Micro Datqfies Monthly microdata files containing characteristics of the insured unemployed are available beginning in October, 1%9. These contain a sample of at least 150,000 individuals seeking benefits during the month. Each record con-
510
JUDITH S. ROW
tains codes for state, month, year, unemployment office, claim type (regular, interstate), number of weeks in this week’s claim, duration of continuous unemployment, sex, color (white, black and other), industry (Zdigit SIC), occupation (3-digit DOT) occupational complexity, and inflation factor (weight showing contribution of record to universe). Summaries of these data are published by the Employment and Training Administration of the U.S. Department of Labor in Unemployment Insurance Statistics. The other publicly available BLS microdata files are those based on the 1960- 1961 and the 1972 - 1973 Surveys of Consumer Expenditures. Data from the earlier survey are contained on a single General Purpose Tape which contains 42 items identifying the survey consumer units, 125 expenditure items, 19 items describing changes in assets and liabilities and 12 income items. Summary data were published in several BLS statistical reports including 237 - 1 through 237 - 27, 237 - 29, 237 - 34 through 237 - 38, 237 - 51 through 237 - 78 and 237 - 84 through 237-93 as well as in BLS analytical reports 238 - 1 through 238 - 16. The more recent project comprises two surveys with separate samples and different data collection methods. It is now available as three separate files. Each file contains data for a cross-section of noninstitutional families from which expenditures and income
data and demographic and economic characteristics were collected. The Diary Public Use Tape also contains detailed data on individual family weekly expenditures for food, alcoholic beverages, tobacco, personal care, housekeeping supplies, nonprescription drugs, gasoline, and heating and cooking fuels. The Interview Public Use Tapes which cover most consumption categories were issued in two forms: one with expenditure groupings and one with considerably more expenditure item detail. For example, whereas the first provides total expenditures for alcoholic beverages, the second provides four separate categories of such purchases differentiating between types of beverages and places of purchase. Data from the Diary Survey have appeared in various news releases and in Reports 448 - 1 through 448- 3 and in Bulletin 1959 (1977). Data from the Interview Survey have appeared in Reports 455 - 1 through 455 - 4 and in Bulletin 1985. Regardless of the mode in which one accesses or uses these data their existance in machinereadable form provides more people with greater opportunities to retrieve and analyze them than was possible when the only public availability was in printed form. The flexibility and capacity of this medium coupled with the capabilities of computer software opens new areas for research and planning.