Measuring scholarly use of government information: An altmetrics analysis of federal statistics

Measuring scholarly use of government information: An altmetrics analysis of federal statistics

GOVINF-01092; No. of pages: 7; 4C: Government Information Quarterly xxx (2015) xxx–xxx Contents lists available at ScienceDirect Government Informat...

471KB Sizes 3 Downloads 13 Views

GOVINF-01092; No. of pages: 7; 4C: Government Information Quarterly xxx (2015) xxx–xxx

Contents lists available at ScienceDirect

Government Information Quarterly journal homepage: www.elsevier.com/locate/govinf

Measuring scholarly use of government information: An altmetrics analysis of federal statistics Tara Das Government Information Librarian, Columbia University, Lehman Social Sciences Library, 420 W. 118th St., New York, NY 10027, USA

a r t i c l e

i n f o

Article history: Received 4 December 2014 Received in revised form 13 May 2015 Accepted 19 May 2015 Available online xxxx Keywords: Citation analysis Government information Altmetrics Statistics Content analysis

a b s t r a c t Purpose: This paper examines how federal statistics is used in scholarly research via a new type of citation analysis that leverages the strengths of information aggregators, like Altmetric LLP, in looking for evidence of government information use beyond traditional citations/references. In this citation analysis, abstracts were examined. Results: Drawing on a dataset containing articles aggregated via Altmetric Explorer, a querying interface provided by Altmetric LLP, content analysis was used to 1) determine the distribution of federal statistics incorporated in scholarly studies, and 2) qualitatively understand the particular ways in which studies incorporated federal statistics. It was found that the dominant source of federal statistics was the National Center for Health Statistics (NCHS), followed by the Census Bureau, and then the Bureau of Labor Statistics. Prevalent qualitative themes underlying the studies in this dataset included mortality and population studies, linked datasets, international studies, and critical studies (i.e. presenting alternative measures for federal statistics). Conclusions: When querying studies referencing one or more of the principal US statistical agencies in Altmetric Explorer, almost all studies in the final dataset cited these agencies because they had cited federal statistics. This finding need not have been the case however. A separate study on the use of federal statistics in scholarly research will compare altmetrics to traditional citation analysis. Preliminary results from Google Scholar, using traditional citations, found non-dataset publications to be the most frequently cited titles from NCHS and the Census Bureau. © 2015 Elsevier Inc. All rights reserved.

1. Introduction The research agenda underlying this study is to measure and analyze the production, dissemination, and use of government information in the United States. Given the wealth and variety of information that the US government produces, several studies are planned to address this agenda in order to understand the following: How much government information is produced? What exactly is produced? Which particular agencies or bureaus are most prolific in producing government information? How is government information used by others (e.g. researchers, activists, journalists)? Which agencies or bureaus have their information more frequently used by others? These questions are complex and can be measured in different ways, focusing on specific types and users of government information. The diagram below visualizes the relationships among the primary questions.

E-mail address: [email protected].

Within the research agenda, this article focuses on how researchers use federal statistics, produced by principal statistical agencies, in their studies; and which statistical agencies have their information more frequently used. It will investigate how federal statistics are used in scholarly research. It will use a new type of citation analysis that leverages

http://dx.doi.org/10.1016/j.giq.2015.05.002 0740-624X/© 2015 Elsevier Inc. All rights reserved.

Please cite this article as: Das, T., Measuring scholarly use of government information: An altmetrics analysis of federal statistics, Government Information Quarterly (2015), http://dx.doi.org/10.1016/j.giq.2015.05.002

2

T. Das / Government Information Quarterly xxx (2015) xxx–xxx

the strengths of altmetrics, in looking for evidence of government information use beyond traditional citations and references. It will also apply content analysis to understand the ways in which federal statistics are incorporated into scholarly studies, i.e. how this government information was used, which is a new contribution to the metrics literature. 2. Background Federal statistics is a major category of US government information. Overall, US government information “constitute[s] a great library covering almost every field of human knowledge and endeavor” (Schmeckebier, Eastin, & Brookings Institution, 1969, p. 1). In particular, federal statistics “present statistical pictures of conditions and afford bases for measuring social and economic change” (Ibid.), such as unemployment rates, high school graduation rates, leading causes of death, and total population counts. These kinds of statistics provide context for identifying problems, allocating public resources, and assessing program effectiveness. Importantly, many federal statistics that the US government produces cannot be replicated by nonprofit organizations and/or universities, as individuals and organizations can be required to report particular events (e.g. births or deaths) or complete questionnaires (e.g. Decennial Census) under legal mandate. As such, federal statistics are highly utilized in scholarly research as raw data for analysis. Highlighting the importance of federal statistics to scholarly research, journalism, and public policy, a special issue of the Annals of the American Academy of Political and Social Science was dedicated to “The Federal Statistical System: Its Vulnerability Matters More than You Think” in September 2010. In the introductory article, Prewitt (2010) states, In particular - and the justification for this volume of The Annals note that the empirical social sciences, from which we get much of the social knowledge relevant to public policies, would not have reached current levels of maturity in the absence of public statistics…federal statistics are indispensable to the scientific investigation of a significant number of social processes, structures, and behaviors. These investigations in turn contribute social knowledge found to be useful in policy design, implementation, evaluation, and adjustment and in public understanding of how well things are working (pp. 7, 14). Prewitt goes on to note that a JSTOR search of leading social science journals for articles from 2008–09 demonstrated that more than half of them used statistics from five US government agencies: the Bureau of Economic Analysis (BEA), the Bureau of Labor Statistics (BLS), the U.S. Census Bureau, the National Center for Health Statistics (NCHS), and the National Oceanic and Atmospheric Administration (NOAA). Unfortunately, Prewitt does not specify the methods used to collect these articles and analyze their citations. Notably, this exclusion is reflected in other studies that measured scholarly use of government information. There have been a few citation studies – counting and analyzing citations in a specific journal or group of journals – conducted on government information. For instance, Goehlert (1979) analyzed the use of US and international government documents in articles published in the journal International Organization from 1972–1976, and found the most frequently cited documents were issued by the State Department and Congress. Brill (1990) also conducted a citation analysis within international relations journals for 1964, 1974, and 1984, which indicated that 46% of all documents cited were from the US government. While detail is provided on the selection process for journals and disciplines in these citation studies (also see Hernon & Shepherd, 1983; Hogenboom, 2002), sufficient detail on how US government documents were identified in the citations is lacking. As another weakness of traditional citation analysis, it is evident that many scholarly articles do not cite government information in a uniform

fashion. They may only cite the government source within the body of the article, and not in the references as a formal citation. When formal citations exist, the authoring agency cited for a particular information product is also not consistent. For instance, Centers for Disease Control and Prevention (CDC) or National Center for Health Statistics (a center within the CDC) may be cited as the author for annual vital statistics reports. The lack of consistency around dataset citation practices compounds the difficulty of using traditional citation analysis to measure use of federal statistics. Moreover, inconsistencies in citing government information make it even more critical for authors of citation studies to document how they identified articles for citation analysis. In addition, how scholarly articles incorporate government information, and which federal agencies are more frequently cited, are research questions that are necessary to analyze in order to better understand how government information is used. Citation studies that have focused on government information, infrequent in themselves, have primarily focused on raw counts and percentages of citations to US government information over all citations. In one exception, Hernon and Shepherd (1983) did find that the majority of US government information, in social science citations, was issued by the Census Bureau, Congress, Federal Bureau of Investigation, National Center for Health Statistics, and Office of the Federal Register. While they point out that social scientists do refer to government information for federal statistics, their study did not discern how many of the citations were to datasets. Given that federal statistics are used in scholarly research, and are in the form of datasets, traditional citation analysis may fail in capturing their usage. Datasets are infrequently cited in bibliographies and reference lists (though there are organizations attempting to build standards for data citation, like DataCite and Inter-university Consortium for Political and Social Research). “Traditional metrics have generally dealt with journals or articles and not measured other significant research output like blog posts, slideshows, datasets, and other important scholarly dialog”(Galligan & Dyas-Correia, 2013, p. 56). Fortunately, alternative metrics (or altmetrics) have been developed to help address these issues. They employ different ways of capturing citations, since so much of scholarly products are now disseminated in electronic journals and other Internet publishing platforms. The emphasis thus far in altmetrics studies has been on ways in which altmetrics can capture online scholarly communication and impact of scholarly articles in social media (e.g. via number of tweets and retweets on Twitter, number of Facebook or blog posts). “In the online environment, we track almost every movement: clicks, page views, and interactions as well as the way we share things with others. A great opportunity lies in the capture of the resulting data trail and building meaningful layers of insights onto it”(Galligan & Dyas-Correia, 2013, p. 58). In addition, libraries can use this data for collection development purposes, particularly for electronic resources available online like datasets. For these resources, circulation statistics, or even number of hits, do not provide much help in understanding which ones should be highlighted in research guides and/or prioritized for digital preservation. Using publicly available data gathered from APIs from broad and sector-specific networks is something that cannot be ignored. Altmetrics present an integrated view of how unit of content or one researcher has moved across the digital landscape in a series of actions or digital conversations (Galligan & Dyas-Correia, 2013, p. 58). As Priem, Piwowar, and Hemminger (2012b), note “In growing numbers, scholars are integrating social media tools like blogs, Twitter, and Mendeley into their professional communications. The online, public nature of these tools exposes and reifies scholarly processes once hidden and ephemeral.” Moreover, citations are slower to accumulate than

Please cite this article as: Das, T., Measuring scholarly use of government information: An altmetrics analysis of federal statistics, Government Information Quarterly (2015), http://dx.doi.org/10.1016/j.giq.2015.05.002

T. Das / Government Information Quarterly xxx (2015) xxx–xxx

social media metrics in gauging research impact and influence. Following this, studies have examined whether social media-based altmetric scores for scholarly articles correlate with traditional citation counts, and, in general, have found moderate correlations (Priem et al., 2012b; Thelwall, Haustein, Larivière, & Sugimoto, 2013). Furthermore, citations “overlook new scholarly forms like datasets, software, and research blogs that fall outside of the scope of citable research objects” (Priem, Groth, & Taraborelli, 2012a, p. 1). The altmetric tools that have been developed to aggregate articles based on online dissemination are indexing more than simply citations. It is possible for users to query the abstract and text as well. For the purposes of this study, federal statistics, when the datasets are used as basis for scholarly research studies, are often cited within the title, abstract, or text of the article. In particular, the scientific abstract would include datasets, as it includes the materials and methods used in the study. Though there are concerns with how altmetrics defines impact and whether it is possible to manipulate impact, the strengths of information aggregation based on alternative metrics to citation (e.g. downloads, usage, mentions) are being leveraged for this study. As Liu and Adie (2014) of Altmetric LLP point out, “Altmetrics tools aggregate online attention rapidly, and within days of an article's publication, researchers are often able to see the online conversations about and mentions of their work that would otherwise have been difficult and time-consuming to find” (Enriching Altmetrics for researchers section, para. 1). 3. Material and methods 3.1. Data collection Altmetric LLP is one company that collects and measures altmetrics, with the objective of tracking and analyzing online activity around scholarly literature. They state, At Altmetric, our approach to collecting and measuring altmetrics has been focused at the article level. On a daily basis, we capture approximately 12,000 online mentions (altmetrics data) of individual scholarly articles and data sets by scanning through social media, blogs, mainstream news outlets, YouTube and various other sources. Such online mentions of articles and data sets range widely in complexity, from simple shares, e.g., a tweet containing a link to a scientific article, to more comprehensive analyses, e.g., blogs and online journal clubs (Liu & Adie, 2014, p. 153). Altmetric LLP started this process in July 2011, by pulling in article and dataset mentions from social media (e.g. Twitter, Facebook, Google+, and a manually curated list of blogs), traditional media that is disseminated online (e.g. New York Times, Scientific American, electronic journals), and online reference managers like Mendeley. The dataset used in this study was created with their web application, Altmetric Explorer, which permits querying and export of data. The “Altmetric Explorer lets you monitor, search and measure conversations” about scholarly articles (Altmetric LLP, n.d.). To retrieve a broad set of results, articles were retrieved via Altmetric Explorer in August 2014. Keyword searches were done for each of the primary federal statistics agencies to query the database. According to the United States Government Accountability Office (2012), there are 13 federal agencies that include statistical work in their core activities. They are located in departments that report to the president. 1) 2) 3) 4)

Statistics of Income — Department of the Treasury Bureau of Labor Statistics — Department of Labor National Center for Education Statistics — Department of Education National Center for Health Statistics — Department of Health and Human Services

3

5) Office of Research, Evaluation, and Statistics — Social Security Administration 6) Energy Information Administration — Department of Energy 7) Bureau of Transportation Statistics — Department of Transportation 8) Economic Research Service — Department of Agriculture 9) National Agricultural Statistics Service — Department of Agriculture 10) Census Bureau — Department of Commerce 11) Bureau of Economic Analysis — Department of Commerce 12) Bureau of Justice Statistics — Department of Justice 13) National Center for Science and Engineering Statistics — National Science Foundation The full name of each statistics agency was placed in quotes for the keyword search to ensure relevant results. This meant that the agency would be referenced within the research article abstract, which ensured that the agency's statistical data was central to the scholarly article. The full name of each agency was discovered to be a more appropriate keyword phrase than the abbreviation. For instance, “Bureau of Labor Statistics” was searched instead of “BLS” since BLS also stands for Basic Life Support and retrieves articles related to cardiac arrest. Likewise, a search on “BJS” retrieves articles from the British Journal of Surgery (that is also abbreviated as BJS) rather than articles referencing the Bureau of Justice Statistics. In total, 196 articles were retrieved and exported. Two articles that pertained to the Italian Census Bureau were removed. All four articles that were retrieved with the keyword search “Statistics of Income” had to be removed from the dataset as they related to the European Union Statistics on Income & Living Conditions (EU-SILC) data. Five articles that were duplicates of other articles were removed. The final dataset contained 185 articles. Each article was then manually reviewed to elicit the author affiliation (government or non-government), the specific data source, and whether the data source was included in the list of references. This additional information was added to the dataset. 3.2. Data analysis The dataset was imported into QSR International's (2014) NVivo 10 qualitative data analysis software for content analysis on the article abstracts, using a grounded theory approach. The purpose of grounded theory as a methodology is to generate theory that is grounded in specific data. The approach was first articulated by Glaser and Strauss (1967). It is completed through detailed and systematic coding procedures. Via a line by line reading of the data, the researcher generates codes. These are categories or concepts that summarize themes found in the data. Open coding is exploratory and produces the initial set of codes, many of which are discarded upon finding that they do not appear frequently. The data is thus translated into numerous codes, which are continuously compared to one another as the researcher hypothesizes on associations between codes. This constant comparative approach is key to grounded theory. Ezzy (2002) states, “Comparisons allow data to be grouped and differentiated, as categories are identified and various pieces of data are grouped together” (p. 90). The content analysis was based on the abstract and the federal statistics data source. The data source was manually assigned based on the dataset(s) that was used in each study. Often the agencies were referenced in the study abstracts, but the specific dataset was not always mentioned within the abstract. In which case, the article's full text was searched to identify the datasets used. Nonetheless it is interesting how often the federal statistics dataset(s) was mentioned in the abstract, but not formally cited in the references of the article. Sixtythree percent of the articles did not include formal citation (to the statistical agency or its parent department) of the federal statistics used, and 37% did. These numbers support the points made earlier, regarding the weakness of traditional citation analysis in failing to capture what is not included in references.

Please cite this article as: Das, T., Measuring scholarly use of government information: An altmetrics analysis of federal statistics, Government Information Quarterly (2015), http://dx.doi.org/10.1016/j.giq.2015.05.002

4

T. Das / Government Information Quarterly xxx (2015) xxx–xxx

4. Results 4.1. Descriptive statistics The final dataset contained no articles referencing the Statistics of Income, Bureau of Transportation Statistics, and the National Center for Science and Engineering Statistics. Keeping in mind that five articles resulted from more than one agency search, the remaining frequency distribution is as follows. Agency

Count

Percent of total

Bureau of Economic Analysis Bureau of Justice Statistics Bureau of Labor Statistics Census Bureau Energy Information Administration Economic Research Service National Agricultural Statistics Service National Center for Education Statistics National Center for Health Statistics Total

2 5 27 69 3 5 1 3 70 185

1.08% 2.7% 14.6% 37.3% 1.62% 2.7% .54% 1.62% 37.84% 100.%

Of the 185 articles, 52 (28.11%) were written by authors with a local, state, or federal government affiliation. In addition, 63.2% of the articles were published between 2011 and 2014. It is interesting that this percent is not higher, given the selection bias towards scholarly research disseminated in social media and that Altmetric LLP began aggregating articles in 2011. 4.2. Content analysis An initial word frequency on the abstracts found that the five most frequently appearing words in the dataset were data (n = 258), health (236), cancer (166), rates (147), and population (133). Mortality (n = 125) and national (110) were also commonly used. Since abstracts referencing the National Center for Health Statistics (NCHS) comprised about 38% of the dataset, the results may have been slightly skewed. Another word frequency was run, this time excluding NCHS abstracts. Data (n = 140) and health (110) remained the two most frequently appearing words, with population (95), study (60), and states (58) rounding out the top five. The dominant source of federal statistics used in research studies is the National Center for Health Statistics (NCHS), located within the US Department of Health and Human Services. There were 102 references to NCHS datasets (many articles used more than one NCHS dataset as part of the study), 68 references to Census data (with most referencing population estimates by demographics), 20 to Bureau of Labor Statistics (BLS) datasets, 2 to National Center for Education Statistics, and 1 to the Economic Research Service. In fact, the majority of the studies covered health-related topics, as alluded to in the word frequency. Of the 185 studies, 33 were classified as non-health studies. Of these 33 studies, 26 were economic in nature and 7 were not. By economic, I mean studies that examined questions relating to wages, costs, employment, management, and manufacturing. These studies used data from the Bureau of Economic Analysis, Bureau of Labor Statistics, and the Census Bureau (e.g. Longitudinal Business Database, Census of Manufacturing). Next, the dominant themes across the studies, identified via grounded theory, will be reviewed. Sample quotations from studies that reflect specific uses of federal statistics will also be provided. 4.2.1. Mortality and population studies Overall, 68 studies incorporated some kind of mortality measure — either from NCHS mortality files, BLS Census of Fatal Occupational Injuries, or the Census Bureau National Longitudinal Mortality Study. Fifty-one studies incorporated Census population estimates. Thus,

mortality and population data were the most frequently used types of federal statistics in scholarly research, based on altmetrics. Studies that used mortality data to analyze trends for particular causes of death were common. For instance, in “The burden of sepsisassociated mortality in the United States from 1999 to 2005: an analysis of multiple-cause-of-death data,” the authors state — Sepsis is the 10th leading cause of death in the United States. The National Center for Health Statistics' multiple-cause-of-death (MCOD) dataset is a large, publicly available, population-based source of information on disease burden in the United States. We have analysed MCOD data from 1999 to 2005 to investigate trends, assess disparities and provide population-based estimates of sepsis-associated mortality during this period (Melamed & Sorvillo, 2009, abstract). Most mortality studies used NCHS data, though its underlying data source – death certificates – is the same underlying data source for other federal mortality data. The National Longitudinal Mortality Study, which is disseminated by the Census Bureau, is one such example. For instance, in “Examining The Long Term Mortality Effects Of Early Health Shocks,” the author used “newly released files from the large, representative National Longitudinal Mortality Study to explore the mortality effects of the 1918 influenza pandemic for those in utero” (Fletcher, 2014, abstract). 4.2.2. Linked data studies Whereas mortality data in these studies were not linked to other datasets, Census population data was almost always used in linkage with other data, as opposed to being analyzed in itself. That is, these were “linked data” studies where multiple datasets were combined for analysis, as opposed to relying on a single dataset for analysis. For example, in “Can Economic Deprivation Protect Health? Paradoxical Multilevel Effects of Poverty on Hispanic Children's Wheezing,” the authors assert — We employ hierarchical logistic regression modeling to test if economic deprivation presents respiratory health risks or benefits to Hispanic children living in the City of El Paso (Texas, USA) at neighborhood- and individual-levels, and whether individuallevel health effects of economic deprivation vary based on neighborhood-level economic deprivation. Data come from the US Census Bureau and a population-based survey of El Paso schoolchildren (Collins, Kim, Grineski, & Clark-Reyna, 2014, p. 7856). In this study, the percent of households below poverty line by census tract was used from the American Community Survey and the Decennial Census to estimate neighborhood-level economic deprivation. This estimation of socioeconomic variables by geographic area that is then linked to other variables in a study was a popular use of Census data, for analyzing trends by gender, race, income, or geographic area. Studies using Census data also linked current or future population estimates to illnesses or phenomena for public health and forecasting studies. For example, in “Population Aging And Emergency Departments: Visits Will Not Increase, Lengths-Of-Stay And Hospitalizations Will,” the authors summarize — With US emergency care characterized as “at the breaking point,” we studied how the aging of the US population would affect demand for emergency department (ED) services and hospitalizations in the coming decades. We applied current age-specific ED visit rates to the population structure anticipated by the Census Bureau to exist through 2050. Our results indicate that the aging of the population will not cause the number of ED visits to increase any more than would be expected from population growth (Pallin, Allen, Espinola, Camargo, & Bohan, 2013, p. 1306).

Please cite this article as: Das, T., Measuring scholarly use of government information: An altmetrics analysis of federal statistics, Government Information Quarterly (2015), http://dx.doi.org/10.1016/j.giq.2015.05.002

T. Das / Government Information Quarterly xxx (2015) xxx–xxx

Including Census data, 73 of the studies in the altmetric dataset contained linked data. Interestingly, linked data studies were not removed from the dataset as duplicates based on agency keywords, alluding both to the variety of terms by which some federal statistics are described and how agencies use each other's data to create agency-specific measures. For instance, in “Do Immigrants Work in Riskier Jobs,” the authors state, This study combines individual-level data from the 2003–2005 American Community Survey with Bureau of Labor Statistics data on work-related injuries and fatalities to take a fresh look at whether foreign-born workers are employed in more dangerous jobs (Orrenius & Zavodny, 2009, p. 535). In this study, Census Bureau data is used but the American Community Survey, the specific instrument, is cited. In addition, the Bureau of Labor Statistics (BLS) data on work-related injuries and fatalities refers to the Census of Fatal Occupational Injuries (CFOI). While CFOI is a BLS program, the data is derived from death certificates, which is the same source for National Center for Health Statistics (NCHS) mortality files. As another illustration, the study “Incarcerating Death: Mortality in U.S. State Correctional Facilities, 1985–1998,” uses Bureau of Justice Statistics and Census population data to measure death rates of working-age prisoners and non-prisoners by sex and race (Patterson, 2010). The Bureau of Justice Statistics derives their own age-specific death rates, using their surveys and reporting programs as well as NCHS mortality files. Approximately half of the studies coded as using linked data did not use mortality or population data. Instead, they mainly used economic, health care insurance, or health care utilization data from sources such as Bureau of Economic Analysis, BLS, NCHS, and non-agency sources. For example, BLS Mass Layoff Statistics were linked with NCHS Youth Risk and Behavior Survey in “Effects of Statewide Job Losses on Adolescent Suicide-Related Behaviors” (Gassman-Pines, Ananat, & GibsonDavis, 2014). In another linked data study that did not use mortality or population data, “Labor Market Trends Among Registered Nurses: 2008-2011,” the authors assert — This study uses recent data from the Bureau of Labor Statistics (BLS) and Registered Nurses (RNs) licensing exam to examine the recession's effect on the RN labor market. It then reports results of a survey of 518 hospital nursing officers conducted in 2008 and 2010 matched with institutional data from the American Hospital Association (AHA) (Benson, 2012, p. 205). In addition to federal statistics datasets being linked with each other, they are also linked with datasets from other federal and non-federal agencies (e.g. state government, non-profit organizations). NCHS mortality datasets were often linked with local and national health datasets in order to ascertain the mortality burden of any illness, disease, or other condition. For instance, one study, “Particulate Air Pollution and Daily Mortality in Steubenville, Ohio,” matched “daily measurements of total suspended particulates by high volume gravimetric sampler” for one metropolitan area to daily deaths as per the NCHS mortality files (Schwartz & Dockery, 1992).

4.2.3. Studies using federal statistics as justification for the importance of their topics Interestingly, the majority of studies that used datasets from outside agencies did not link these datasets with federal statistics for analysis (27 of 34). They used federal statistics, primarily from Bureau of Justice Statistics, Bureau of Labor Statistics, or Census Bureau, solely to provide context for their study and justify its importance. These studies generally focused on particular organizational contexts: prisons, hospitals and health care settings, and other workplace settings. For instance, in “Nothing Changes, Nobody Cares: Understanding the Experience of

5

Emergency Nurses Physically or Verbally Assaulted While Providing Care,” the authors assert, According to data from the Bureau of Labor Statistics, the most common source of nonfatal injuries and illnesses requiring days away from work in the health care and social assistance industry was assault on the health care worker (Wolf, Delao, & Perhats, 2014, p. 305). 4.2.4. International studies There were 16 international studies, studies taking place outside the United States, which all related to health (e.g. nutrition, obesity, HIV/ AIDS, tuberculosis, and aging). Eleven of the 16 studies used the NCHS growth references for measuring nutrition levels in children and adolescents. In the 1970s, the World Health Organization adopted the NCHS references for international use and dissemination. According to the World Health Organization, “The designation of a child as having impaired growth implies some means of comparison with a “reference” child of the same age and sex. Thus, in practical terms, anthropometric values need to be compared across individuals or populations in relation to an acceptable set of reference values” (World Health Organization, n.d., The international reference population, para. 1). As an example, the authors in “Prevalence of and Risk factors for Stunting among School Children and Adolescents in Abeokuta, Southwest Nigeria” state The objectives of the study were to determine the prevalence of and risk factors associated with stunting among urban school children and adolescents in Abeokuta, Nigeria. Five hundred and seventy children aged 5–19 years were selected using the multi-stage randomsampling technique. Stunting was defined as height-for-age zscore (HAZ) of b−2 standard deviation (SD) of the National Center for Health Statistics reference (Senbanjo, Oshikoya, Odusanya, & Njokanma, 2011, p. 364). As another illustration, In “Secular changes in the height of Polish schoolboys from 1955 to 1988”, the authors state, “We analyzed the changes in height of boys, aged 7-18 years, from surveys conducted in 1955, 1966, 1978 and 1988. Data for height were converted to Z-scores using the LMS method and the 2000 National Center for Health Statistics reference” (Bielecki, Haas, & Hulanicka, 2012, p. 310). 4.2.5. Critical studies There were 27 studies that were critical of federal statistics. By critical, I mean studies that investigated alternative ways of measuring federal statistics (often using other government data sources) to assess accuracy of the federal statistics themselves. They were roughly divided in relating to health versus economic issues. Most of these critical studies linked federal data to comparable federal and/or state data; whereas only one study – “Medical Care Price Indexes for Patients with Employer-Provided Insurance: Nationally Representative Estimates from MarketScan Data” – relied solely on a nongovernment dataset to assess accuracy in federal statistics (Dunn, Liebman, Pack, & Shapiro, 2013). For example, 4 critical studies focused on the accuracy of BLS reporting in its Survey of Occupational Injuries and Illnesses. One of these studies, “An estimate of the U.S. Government's undercount of nonfatal occupational injuries,” hypothesized that the exclusion of certain occupation categories, such as government workers and selfemployed, led to significant under-capture of occupational injuries by reviewing injury data from government and self-employment sources (Leigh, Marcin, & Miller, 2004). Another study, “Injury classification agreement in linked Bureau of Labor Statistics and Workers' Compensation data,” examined level of agreement in estimates of occupational injury between BLS Survey of Occupational Injuries and Illnesses and state Workers' Compensation claims (Wuellner & Bonauto, 2014).

Please cite this article as: Das, T., Measuring scholarly use of government information: An altmetrics analysis of federal statistics, Government Information Quarterly (2015), http://dx.doi.org/10.1016/j.giq.2015.05.002

6

T. Das / Government Information Quarterly xxx (2015) xxx–xxx

Another 4 studies addressed different ways of measuring BLS price indexes and employment data. The remaining 4 studies that were considered economic referred to the Current Population Survey, which is conducted by the Census Bureau and BLS. These studies looked at definitions and procedures for measuring poverty, occupational coding, and health insurance. Of the critical studies that related to health, 6 studies evaluated the impact on malnutrition rates from using NCHS growth reference standards versus other standards, like the World Health Organization reference. Another 4 studies evaluated the quality of survey definitions and responses in NCHS health surveys such as the National Survey of Family Growth and the National Health and Nutrition Examination Survey. Interestingly, while mortality and population data were major themes when considering the entire dataset of 185 studies, very few studies were critical of NCHS mortality data or Census population data. Only 3 studies looked further into estimates of cause-specific mortality for measles, liver-related, and invasive candidiasis, respectively. Moreover, two of these studies used other NCHS data — the National Hospital Discharge Survey in “Epidemiology of invasive candidiasis: a persistent public health problem” (Pfaller & Diekema, 2007) and National Immunization Program in “Acute Measles Mortality in the United States, 1987–2002” (Hinman et al., 2004) to evaluate mortality estimates. The third study used medical records from the Rochester Epidemiology Project in “Underestimation of Liver-Related Mortality in the United States” (Asrani, Larson, Yawn, Therneau, & Kim, 2013). 5. Results summary Drawing on a dataset containing articles aggregated via Altmetric Explorer, a querying interface provided by Altmetric LLP, content analysis was used to 1) determine the distribution of federal statistics incorporated in scholarly studies, and 2) qualitatively understand the particular ways in which studies incorporated federal statistics. It was found that the dominant source of federal statistics was the NCHS, followed by the Census Bureau, and then the Bureau of Labor Statistics. Prevalent qualitative themes underlying the studies in this dataset included mortality and population studies, linked datasets, international studies, and critical studies (i.e. presenting alternative measures for federal statistics). 6. Discussion This study investigated how federal statistics is used in scholarly research, in a new type of citation analysis. It used altmetrics with its strengths in capturing use of non-traditional research outputs like datasets; in looking for evidence of government information use beyond traditional citations and references; and in tracking usage within days after publication. The data collection procedures for the analysis, including methods used to identify the use of federal statistics within articles, were outlined in detail. This transparency was lacking in past citation studies of government information. Content analysis, guided by grounded theory, was employed to qualitatively understand the ways in which federal statistics were incorporated, i.e. how this government information was used by researchers. 6.1. Future directions for government information metrics When using altmetrics to identify studies that referred to one or more of the principal US statistical agencies, almost all the studies cited these agencies' datasets. This finding need not have been the case however. A separate study on the use of federal statistics in scholarly research will compare altmetrics to traditional citation analysis, using Publish or Perish software (Harzing, 2007). This software uses raw citations from Google Scholar in order to calculate the total number of citations and average number of citations per year, among other indicators. For example, querying author = “National Center for Health

Statistics” for years 2011 to 2014 in Publish or Perish returned nondataset publications as the most frequently cited titles (e.g. Prevalence of obesity in the United States, 2009–2010; Health, United States, 2010; and Vitamin D status: United States, 2001–2006). Likewise, author = Census Bureau or Bureau of the Census returned publications This Hispanic population: 2010 and Americans with disabilities: 2010 as the most frequently cited titles. The ways in which using different metrics, including reference interviews, course syllabi, and social media interactions, impact analysis will be considered in future studies that seek to understand how government information is used and by whom. These studies are part of the larger research agenda outlined in the introduction. References Altmetric LLP. (n.d.). The Altmetric Explorer. Retrieved from http://www.altmetric.com/ aboutexplorer.php. Asrani, S. K., Larson, J. J., Yawn, B., Therneau, T. M., & Kim, W. (2013). Underestimation of liver-related mortality in the United States. Gastroenterology, 145(2), 375–382. Benson, A. (2012). Labor market trends among registered nurses 2008–2011. Policy, Politics & Nursing Practice, 13(4), 205–213. Bielecki, E. M., Haas, J. D., & Hulanicka, B. (2012). Secular changes in the height of Polish schoolboys from 1955 to 1988. Economics and Human Biology, 10(3), 310–317. Brill, M. S. (1990). Government publications as bibliographie references in the periodical literature of international relations: A citation analysis. Government Information Quarterly, 7(4), 427–439. Collins, T. W., Kim, Y., Grineski, S. E., & Clark-Reyna, S. (2014). Can economic deprivation protect health? Paradoxical multilevel effects of poverty on Hispanic children's wheezing. International Journal of Environmental Research and Public Health, 11(8), 7856–7873. Dunn, A., Liebman, E., Pack, S., & Shapiro, A. H. (2013). Medical care price indexes for patients with employer‐provided insurance: Nationally representative estimates from MarketScan data. Health Services Research, 48(3), 1173–1190. Fletcher, J. M. (2014). Examining the long term mortality effects of early health shocks. US Census Bureau Center for Economic Studies Paper No. CES-WP-14-19. Galligan, F., & Dyas-Correia, S. (2013). Altmetrics: Rethinking the way we measure. Serials Review, 39(1), 56–61. Gassman-Pines, A., Ananat, E. O., & Gibson-Davis, C. M. (2014). Effects of statewide job losses on adolescent suicide-related behaviors. American Journal of Public Health, 104(10), 1964–1970. Glaser, В., & Strauss, A. (1967). The discovery of grounded theory. Chicago: Aldine Publishing. Goehlert, R. (1979). A citation analysis of international organization: The use of government documents. Government Publications Review, 6(2), 185–193 (1973). Harzing, A. (2007). Publish or perish. [Software]. Retrieved from http://www.harzing. com/pop.htm Hernon, P., & Shepherd, C. A. (1983). Government publications represented in the Social Sciences Citation Index: An exploratory study. Government Publications Review, 10(2), 227–244. Hinman, A. R., Gindler, J., Tinker, S., Markowitz, L., Atkinson, W., Dales, L., et al. (2004). Acute measles mortality in the United States, 1987–2002. Journal of Infectious Diseases, 189(Supplement 1), S69–S77. Hogenboom, K. (2002). Has government information on the Internet affected citation patterns?: A case study of population studies journals. Journal of Government Information, 29(6), 392–401. Leigh, J. P., Marcin, J. P., & Miller, T. R. (2004). An estimate of the US Government's undercount of nonfatal occupational injuries. Journal of Occupational and Environmental Medicine, 46(1), 10–18. Liu, J., & Adie, E. (2014). Realising the potential of altmetrics within, institutions. (Ariadne(72)). Melamed, A., & Sorvillo, F. J. (2009). The burden of sepsis-associated mortality in the United States from 1999 to 2005: An analysis of multiple-cause-of-death data. Critical Care, 13(1), R28. Orrenius, P. M., & Zavodny, M. (2009). Do immigrants work in riskier jobs? Demography, 46(3), 535–551. Pallin, D. J., Allen, M. B., Espinola, J. A., Camargo, C. A., & Bohan, J. S. (2013). Population aging and emergency departments: Visits will not increase, lengths-of-stay and hospitalizations will. Health Affairs, 32(7), 1306–1312. Patterson, E. J. (2010). Incarcerating death: Mortality in US state correctional facilities, 1985–1998. Demography, 47(3), 587–607. Pfaller, M., & Diekema, D. (2007). Epidemiology of invasive candidiasis: A persistent public health problem. Clinical Microbiology Reviews, 20(1), 133–163. Prewitt, K. (2010). Introduction: Science starts not after measurement, but with measurement. The Annals of the American Academy of Political and Social Science, 631, 7–16. http://dx.doi.org/10.2307/20744003. Priem, J., Groth, P., & Taraborelli, D. (2012a). The altmetrics collection. PLoS One, 7(11), e48753. Priem, J., Piwowar, H. A., & Hemminger, B. M. (2012b). Altmetrics in the wild: Using social media to explore scholarly impact. (arXiv, preprint arXiv:1203.4745). Q.S.R.International Pty Ltd (2014). NVivo qualitative data analysis software [computer software]. Schmeckebier, L. F., Eastin, R. B., & Brookings Institution (1969). Government publications and their use (2d rev. Ed.). Washington DC: Brookings Institution. Schwartz, J., & Dockery, D. W. (1992). Particulate air pollution and daily mortality in Steubenville, Ohio. American Journal of Epidemiology, 135(1), 12–19.

Please cite this article as: Das, T., Measuring scholarly use of government information: An altmetrics analysis of federal statistics, Government Information Quarterly (2015), http://dx.doi.org/10.1016/j.giq.2015.05.002

T. Das / Government Information Quarterly xxx (2015) xxx–xxx Senbanjo, I. O., Oshikoya, K. A., Odusanya, O. O., & Njokanma, O. F. (2011). Prevalence of and risk factors for stunting among school children and adolescents in Abeokuta, Southwest Nigeria. Journal of Health, Population and Nutrition, 29(4), 364–370. Thelwall, M., Haustein, S., Larivière, V., & Sugimoto, C. R. (2013). Do altmetrics work? Twitter and ten other social web services. PLoS One, 8(5), e64841. United States Government Accountability Office (2012). Federal statistical system: Agencies can make greater use of existing data, but continued progress is needed on access and quality issues. Retrieved from http://www.gao.gov/assets/590/588856.pdf Wolf, L. A., Delao, A. M., & Perhats, C. (2014). Nothing changes, nobody cares: Understanding the experience of emergency nurses physically or verbally assaulted while

7

providing care. Journal of Emergency Nursing, 40(4), 305–310. http://dx.doi.org/10. 1016/j.jen.2013.11.006. World Health Organization. (n.d.). Global Database on Child Growth and Malnutrition. Retrieved from http://www.who.int/nutgrowthdb/about/introduction/en/index3.html. Wuellner, S. E., & Bonauto, D. K. (2014). Injury classification agreement in linked Bureau of Labor Statistics and Workers' Compensation data. American Journal of Industrial Medicine, 57(10), 1100–1109.

Please cite this article as: Das, T., Measuring scholarly use of government information: An altmetrics analysis of federal statistics, Government Information Quarterly (2015), http://dx.doi.org/10.1016/j.giq.2015.05.002