Journal of Informetrics 11 (2017) 730–744
Contents lists available at ScienceDirect
Journal of Informetrics journal homepage: www.elsevier.com/locate/joi
Regular article
Do subjective journal ratings represent whole journals or typical articles? Unweighted or weighted citation impact? William H. Walters Mary Alice & Tom O’Malley Library, Manhattan College, 4513 Manhattan College Parkway, Riverdale, NY 10471, USA
a r t i c l e
i n f o
Article history: Received 12 March 2017 Received in revised form 1 May 2017 Accepted 1 May 2017 Keywords: Eigenfactor Impact factor Indicator Journal ranking SJR SNIP
a b s t r a c t This study uses journal ratings in criminology and criminal justice, library and information science, public administration, and social work to investigate two research questions: (1) Are stated preference (subjective) journal ratings more closely related to size-dependent citation metrics (eigenfactor and total citations, which represent the impact of the journal as a whole) or to size-independent citation metrics (article influence and CiteScore, which represent the impact of a typical article)? (2) Are stated preference ratings more closely related to unweighted citation metrics (five-year impact factor and source normalized impact per publication, which do not account for the impact of each citing journal) or to weighted citation metrics (article influence and SCImago journal rank, which do)? Within the disciplines evaluated here, respondents’ subjective ratings of journals are more closely related to size-independent metrics and weighted metrics. The relative strength of the relationship between subjective ratings and size-independent metrics is moderated by subject area and other factors, while the relative strength of the relationship between subjective ratings and weighted metrics is consistent across all four disciplines. These results are discussed with regard to popularity and prestige, which are sometimes associated with unweighted and weighted citation metrics, respectively. © 2017 Elsevier Ltd. All rights reserved.
1. Introduction Journal rating metrics—indicators of journal impact, prestige, reputation, utility, or perceived quality—can be readily classified into two types (Tahai & Meyer, 1999).1 Revealed preference metrics are those that represent actual behaviors such as publishing, indexing, and citing. The most common revealed preference metrics are citation metrics such as the h index, impact factor (IF), source normalized impact per paper (SNIP), eigenfactor (EF), article influence score (AI), and SCImago journal rank (SJR). In contrast, stated preference metrics—also known as subjective or reputational ratings—represent scholars’ opinions or hypothetical behaviors (e.g., “Which of these journals are most important to your work?” “Which carry the most weight in tenure and promotion decisions?”). Stated preference metrics are generally based on surveys of authors or faculty. They are most likely to be found in the social sciences and humanities, where the relationship between citation impact and perceived quality or reputation is not always straightforward. Moreover, stated preference metrics may better represent the opinions of scholars outside the “publish or perish” community—managers, policymakers, teachers, and industrial researchers, for instance (Bollen & Van de Sompel, 2008; Gorraiz & Gumpenberger, 2010; Schlögl & Gorraiz, 2010).
E-mail address:
[email protected] Metrics and indicators are used interchangeably here. Although journal rankings is a more common phrase than journal rating, the ratings themselves—rather than the ordinal rankings that result from them—are of primary interest in this study. 1
http://dx.doi.org/10.1016/j.joi.2017.05.001 1751-1577/© 2017 Elsevier Ltd. All rights reserved.
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
731
This paper uses multiple journal ratings in four disciplines—criminology and criminal justice, library and information science, public administration, and social work—to investigate two research questions:
(1) Are stated preference journal ratings more closely related to size-dependent citation metrics (those that represent the impact of the journal as a whole) or to size-independent citation metrics (those that represent the impact of a typical article)?2 (2) Are stated preference journal ratings more closely related to unweighted citation metrics (those that do not account for the impact of each citing journal) or to weighted citation metrics (those that do account for the impact of each citing journal)?
The first question relies on an important distinction. Size-dependent (whole journal) metrics such as total citations, EF, and the h index represent the number of citations accruing to all the articles in the journal. All else equal, a journal that publishes more articles will gain more citations, a higher EF, and a higher h index. In contrast, size-independent (typical article) metrics such as AI, CiteScore, IF, SJR, and SNIP divide total impact by the number of articles published and are therefore not influenced by journal size.3 For citation metrics, the distinction between size-dependent and size-independent indicators is clear. With stated preference ratings, however, the instructions to survey respondents seldom specify whether they ought to be evaluating entire journals or a typical article within each journal. Section 4 addresses this question—whether scholars (respondents) consider journal size when rating journals. The second question is based on the distinction between unweighted metrics (which assign equal weight to each citation, regardless of the characteristics of the citing journal) and weighted metrics (which assign higher weights to citations that appear in more influential journals). Influence refers to citedness and, in the case of SCImago Journal Rank, network centrality. Although nearly 20 unweighted and weighted citation metrics are available from data download sites such as Journal Citation Reports (JCR), Eigenfactor, CWTS Journal Indicators, SCImago Journal & Country Rank, Scopus Journal Metrics, and Google Scholar Metrics, it is not obvious that either unweighted or weighted metrics are preferable as indicators of impact, prestige, reputation, or perceived quality. Section 5 presents one way of addressing this issue; it identifies the type of indicator, unweighted or weighted, that more closely coincides with the journal ratings assigned by scholars. These research questions are important for at least two reasons. First, investigations such as this can help us understand the relationships between impact, reputation, prestige, and related constructs as they apply to journals. We can use established citation metrics as landmarks, comparing them with stated preference ratings in order to better understand what survey respondents mean when they rate journals. This kind of comparison is possible, however, only if we first address the questions presented here. Second, comparisons of multiple metrics can help us gauge the convergent validity of each one. Newer indicators such as SNIP and SJR are more likely to be accepted if we know they are correlated with other indicators of journal “quality,” especially when those other indicators use dissimilar methods to arrive at similar results (Cohn & Farrington, 2011; Martin, 1996; So, 1998; Weisheit & Regoli, 1984).
2. Previous research Although many studies have investigated the correlations among citation metrics, fewer have examined the relationships between citation metrics and stated preference ratings. Two findings from the pre-2000 literature are especially notable:
(1) Stated preference ratings sometimes represent each journal’s influence within a particular field or subfield rather than its more general scholarly impact. For instance, He and Pao (1986) discovered that the journal ratings assigned by scholars in the field of veterinary medicine are inversely related to the journals’ impact factors (r = −0.20). However, those same ratings are directly related to the number of times each journal has been cited within a set of 74 leading veterinary journals (r = 0.74). This suggests that veterinary medicine is a relatively insular field in which journals are evaluated largely in terms of their influence on practice. (2) The relationships between citation metrics and stated preference ratings are not always linear. In economics, sociology, and political science, for example, the top journals are assigned consistently higher subjective ratings than their IFs would suggest (Christenson & Sigelman, 1985; Ellis & Durden, 1991).
2 The phrase typical article is used to distinguish size-independent metrics from size-dependent (whole journal) metrics. It is not strictly correct, however, since nearly every journal’s citation distribution has a strong positive skew. For most journals, the average impact per article is substantially higher than the median impact per article (Calver & Bradley, 2009; Crookes et al., 2010; Seglen, 1997). 3 The distinction between size-dependent and size-independent metrics has important implications for their use (Nisonger, 2004; Walters, 2016a, 2016b, 2016c). A librarian evaluating the cost effectiveness of various journals is likely to be interested in size-dependent metrics. In contrast, an author deciding where to send his or her paper may be more interested in size-independent metrics.
732
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
The post-2000 literature includes just two studies that examined the relationships between stated preference ratings and multiple citation metrics.4 Using data for more than 11,000 journals in 27 fields, Haddawy, Hassan, Asghar, and Amin (2016) evaluated the correlations between three citation metrics (IF, IPP, and SNIP) and the journal ratings assigned by subject experts for the Excellence in Research Australia (ERA) research assessment exercise. Across the 27 fields, the ERA ratings were more closely related to SNIP (rho = 0.54) than to IPP (rho = 0.38) or IF (rho = 0.37). These findings do not bear directly on our research questions, however, since IF, IPP, and SNIP are all size-independent and unweighted. Saarela, Kärkkäinen, Lahtonen, and Rossi (2016) predicted subject experts’ ratings of more than 17,000 journals and conference proceedings in a wide range of subject areas based on three citation metrics—IPP, SNIP, and SJR—and a range of journal characteristics such as database coverage and language of publication. Of the three citation metrics, SNIP was the most predictive and IPP was the least predictive. These results are not conclusive with regard to our research questions, however, because SJR, the weighted metric used by Saarela et al., is a more effective predictor than IPP (unweighted) but a less effective predictor than SNIP (also unweighted). Studies that focus on other topics—those that critique a particular indicator, for instance, or that introduce new journal ratings for a particular field of study—sometimes present the correlations between subjective ratings and journal citation metrics. Table 1 shows data for a number of such studies, each of which reports the correlation(s) between at least one citation metric and at least one stated preference metric.5 Overall, these correlations suggest that stated preference ratings are more closely related to size-independent metrics such as IF, SNIP, AI, and SJR than to size-dependent metrics such as total citations, the h index, and EF. It is difficult to draw further conclusions from the table, however, since so few of the reported correlations involve citation metrics other than IF. Finally, it is important to realize that subjective ratings of journal quality are not always closely related. Lange (2006) found substantial differences between six subjective ratings of business journals, and other research shows that metrics based on prestige, contributions to theory, contributions to teaching, and contributions to practice are only modestly related (Miller & Dodge, 1979; Polonsky & Whitelaw, 2005; Shilbury & Rentschler, 2007). This has implications for the present study. If the various stated preference ratings differ in their central constructs, or if those constructs are not made clear to survey respondents, then the wide range of factors that shape respondents’ opinions may prevent the emergence of a clear pattern of relationships between the citation metrics and the stated preference ratings. 3. General methods Four disciplines—criminology and criminal justice (CCJ), library and information science (LIS), public administration (PAD), and social work (SWK)—were selected for investigation. The social sciences were chosen because they are covered well in both citation databases and stated preference studies. Moreover, the chosen fields are those for which weighted and unweighted metrics might be expected to differ. In LIS, for instance, a distinction can be made between the top information science journals (which tend to be cited in other high-impact journals) and the foremost practice-oriented journals (which are also cited often, but chiefly in journals with lower citation rates). Literature searches were conducted in 12 bibliographic databases in an attempt to identify all the stated preference journal ratings published in the four selected fields.6 However, certain kinds of studies and ratings were excluded: studies published before 2000 (since few of the necessary citation metrics are available before that date); ratings for which the construct evaluated by respondents was not related to prestige, reputation, impact, utility, or perceived quality; ratings based chiefly on pre-existing data rather than original survey research (e.g., Nixon, 2014; Sellers, Mathiesen, Smith, & Perry, 2006; Sorensen, 2009); ratings that placed journals into five or fewer categories (A, B, C, D, etc.) rather than assigning a specific rating to each journal; and studies that asked respondents to list just the top few journals, without considering any others. These searches and selection criteria led to the identification of eight stated preference ratings suitable for inclusion in the study—one in CCJ, five in LIS, one in PAD, and one in SWK. (See Table 2.)7 Three of the eight ratings are based on surveys of faculty, two on surveys of academic deans, two on surveys of library directors, and one on a survey of scholarly society members. The number of survey responses ranges from 37 to 556, with a median of 144. The number of rated journals ranges from 22 to 88, with a median of 71. Respondents were asked to rate each journal on a scale of 1–5, 1–7, or 1–10, and seven of the eight metrics—all but SWK-1—are simple averages (means) of the respondents’ ratings.
4 In contrast, comparisons of various citation metrics are not uncommon. See, for example, Bollen et al. (2009); Franceschet (2009); Leydesdorff (2009); Saarela et al. (2016); Tsai (2014); and Yan et al. (2011). The correlations between citation metrics are generally high—in the 0.80 to 0.90 range—when either size-dependent or size-independent measures are compared (when like is compared with like). 5 The table was compiled through a comprehensive search of the social science literature plus a less intensive search of the natural science literature, where stated preference studies are less common. It excludes those studies that compared metrics without presenting correlation coefficients. 6 Each search was essentially (journal as subject heading OR journal* as keyword) AND (rating* as keyword OR rank* as keyword OR impact* as keyword). If there was no subject heading for academic journals, the heading for periodicals was used. 7 Curry et al. (2014) was found in Google Scholar. Manzari (2013), Nisonger & Davis (2005), and Sellers et al. (2004) were each found in Academic Search Complete, Google Scholar, ProQuest Central, Scopus, Social Sciences Citation Index, and either LISTA or SocINDEX. Sorensen et al. (2006) was found in Academic Search Complete, Google Scholar, ProQuest Central, and SocINDEX. Searches of ABI/INFORM, the International Bibliography of the Social Sciences, PAIS, and Sociological Abstracts did not lead to the identification of any relevant journal ratings.
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
733
Table 1 Correlations between stated preference ratings and citation metrics, as reported in the literature, 2000 to present. Pearson’s r or Spearman’s rho, as appropriate. Each column represents the indicated metric as well as any closely related metrics. (The h index column also includes the g index, for instance, and the IF column also includes IPP.) Schlögl and Stock’s (2004) data are for German-speaking scholars. Stated preference rating Size-dependent or size-independent? Unweighted or weighted?
Field — —
Donohue & Fox, 2000 a West & McIlwaine, 2002 Schlögl & Stock, 2004 Sellers et al., 2004 Turban, Zhou, & Ma, 2004 Dul, Karwowski, & Vinken, 2005 Gorman & Kanet, 2005 Katerattanakul, Razi, Han, & Kam, 2005 Katerattanakul et al., 2005 Katerattanakul et al., 2005 Katerattanakul et al., 2005 Katerattanakul et al., 2005 Olson, 2005 Maier, 2006 a Marsh & Hunt, 2006 Giles & Garand, 2007 Mingers & Harzing, 2007 Mingers & Harzing, 2007 Mingers & Harzing, 2007 Mingers & Harzing, 2007 Mingers & Harzing, 2007 Mingers & Harzing, 2007 Mingers & Harzing, 2007 Mingers & Harzing, 2007 Mingers & Harzing, 2007 Mingers & Harzing, 2007 Bartels, Magun-Jackson, Ryan, & Glass, 2008 Vanclay, 2008 Morris, Harvey, & Kelly, 2009 Rosenstreich & Wooliscroft, 2009 a Goldstein & Maier, 2010 Haddow & Genoni, 2010 Lee, Cronin, McConnell, & Dean, 2010 Serenko & Dohan, 2011 Serenko & Dohan, 2011 Zhang, Liu, Xu, & Wang, 2011 Zhang et al., 2011 Zhang et al., 2011 Ahlgren & Waltman, 2014 Cahn, 2014 Cahn, 2014 Cahn, 2014 Cahn, 2014 Cahn, 2014 Haddawy et al., 2016 Haddawy et al., 2016 Haddawy et al., 2016 a Haddawy et al., 2016 a
Mgt. science Addiction LIS Social work Inf. systems Ergonomics Mgt. science Inf. systems Inf. systems Inf. systems Inf. systems Inf. systems Mgt. science Regional science Management Political science Business Business Business Business Business Business Business Business Business Business Psychology Forestry Business Accounting Planning Social sciences Heterodox econ. Artificial intel. Artificial intel. Business Business Business 27 fields Accounting Economics Finance Management Marketing 27 fields Humanities Natural sciences Social sciences
a
Total citns. Dep. Unwtd.
h index Dep. Unwtd.
EF Dep. Wtd.
IF Indep. Unwtd.
SNIP Indep. Unwtd.
AI Indep. Wtd.
SJR Indep. Wtd.
— −0.01 — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — 0.64 — Mod. — Weak — 0.62 0.65 — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — Mod. Mod. Weak — — — — — — — — — —
0.55 — −0.11 0.77 Strong 0.90 0.79 0.32 0.61 0.23 0.73 0.56 0.46 0.01 0.79 0.66 0.44 0.60 0.41 0.66 0.34 0.40 0.52 0.56 0.64 0.68 Weak 0.52 0.77 — None Mod. — 0.51 0.56 Mod. Mod. Weak Mod. Strong Mod. Strong Mod. Strong 0.37 0.19 0.60 0.45
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — Weak 0.24 — — — — — Strong — — — — — 0.54 0.27 0.65 0.55
— — — — — — — — — — — — — — — — — — — — — — — — — — 0.68 — — — — — — — — — — — — — — — — — — — — —
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — Strong — — — — — — — — —
Average for several related journal ratings, citation impact indicators, or subject areas.
As Table 2 shows, Nisonger and Davis (2005) present four distinct metrics that differ in terms of the surveyed populations—deans of LIS degree programs versus directors of research libraries—and in the handling of missing values (i.e., respondents who declined to rate particular journals). While five of the eight metrics shown in Table 2 exclude missing values from the calculation of means, LIS-1 and LIS-3 assign the lowest possible rating (a score of zero) to missing values. That is, they rely on the assumption that less familiar journals are necessarily less important. This is an unconventional practice that may result in a bias against good journals in specialized or heterodox subfields (Garand, 1990; Mason, Steagall, & Fabritius, 1997; Weisheit & Regoli, 1984). At the same time, research has shown that journal reputation and familiarity are indeed related. Most studies report correlations in the 0.50–0.70 range (Luke & Doke, 1987; Nelson, Buss, & Katzko, 1983; Serenko & Bontis, 2011; Shilbury & Rentschler, 2007). Within this set of ratings, SWK-1 is unique in its handling of missing values. Sellers, Mathiesen, Perry, and Smith (2004) asked respondents to rate each journal on a 1–7 scale, then calculated a prestige score that accounts for both the mean rating
734
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
Table 2 Stated preference ratings included in the study. Criminology and criminal justice CCJ-1 (Sorensen, Snell, & Rodriguez, 2006, pp. 311–312) Respondents: 555 members of the American Society of Criminology and/or the Academy of Criminal Justice Sciences. Limited to non-students with college/university affiliations and US addresses. Journals: 69 periodicals identified from previous studies, guides, and web sites. Limited to scholarly journals that regularly publish US authors and that focus more on CCJ than on other fields. Survey question and scale: “Rate the following journals on a scale of 1 (poor) to 10 (excellent).” Mean score; blank responses were disregarded. The authors assumed that an intentionally vague specification would capture prestige but that more specific instructions would capture other constructs such as utility. Note: This study also presents separate results for criminal justice and criminology, highlighting the differences between the two subfields. Library and information science LIS-1 (Nisonger & Davis, 2005, pp. 346–349) Respondents: 37 deans of LIS programs accredited by the American Library Association (ALA). Journals: 71 journals identified in a previous ranking study or included in the JCR information science & library science category, plus a few newer and/or online journals. Survey question and scale: “How important [is] publication in each journal. . .for promotion and tenure at your institution.” 1–5 scale. Mean score; blank responses were counted as scores of 0. LIS-2 (Nisonger & Davis, 2005, pp. 350–353) Respondents: 37 deans of ALA-accredited LIS programs. Journals: Same as for LIS-1. Survey question and scale: “How important [is] publication in each journal. . .for promotion and tenure at your institution.” 1–5 scale. Mean score; blank responses were disregarded. LIS-3 (Nisonger & Davis, 2005, pp. 346–349) Respondents: 56 directors of major research libraries (those in ARL, the Association of Research Libraries). Journals: Same as for LIS-1. Survey question and scale: Rate each journal “according to the prestige associated with publishing in it.” 1–5 scale. Mean score; blank responses were counted as scores of 0. LIS-4 (Nisonger & Davis, 2005, pp. 350–353) Respondents: 56 directors of ARL libraries. Journals: Same as for LIS-1. Survey question and scale: Rate each journal “according to the prestige associated with publishing in it.” 1–5 scale. Mean score; blank responses were disregarded. LIS-5 (Manzari, 2013, pp. 46–47) Respondents: 232 full-time faculty in ALA-accredited LIS programs. Journals: 88 journals identified in a previous ranking study, plus selected journals included in the JCR information science & library science category. Survey question and scale: Importance to respondent’s research and teaching. 1–5 scale. Mean score; blank responses were disregarded. Public administration PAD-1 (Curry et al., 2014, p. 22) Respondents: 293 faculty (academic staff) in public administration programs at UK and European universities. Journals: 22 journals—the top 30, by IF, in the JCR public administration category, excluding those that are overly specialized or unlikely to interest European readers. Survey question and scale: “Assess each journal in terms of the general quality of the articles it publishes.” 1–10 scale. Mean score; blank responses were disregarded. Note: This “general quality” metric is strongly related to the journals where respondents would (a) send their best papers and (b) expect to find the best research. Curry et al. do not present full rankings for either (a) or (b), however. Social work SWK-1 (Sellers et al., 2004, p. 152) Respondents: 556 faculty in social work programs accredited by the Council on Social Work Education. Journals: 38 journals identified in a previous ranking study or included in the JCR social work category. Survey question and scale: Mean rating for overall quality (1–7 scale) multiplied by the square root of the proportion of respondents who rated the journal. Note: For further information on this study, see Sellers et al. (2006).
and the proportion of respondents who rated each journal. Several other authors have adopted this approach (Garand, 1990; Nederhof, Luwel, & Moed, 2001; Nkereuwem, 1997). The correlations used to evaluate the two research questions are presented in Sections 4 and 5. Each correlation examines the relationship between one stated preference metric and one citation metric. The number of cases (journals) varies with each correlation but is always lower than the sample size reported in Table 2, since not every journal with a stated preference rating is included in the relevant citation databases. Because the JCR information science & library science category represents at least two distinct disciplines—LIS and management information systems (MIS)—the nine MIS journals identified by Ni, Sugimoto, and Cronin (2013) were excluded from the analysis.8
8 The excluded journals are Information & Management, the Information Systems Journal, Information Systems Research, the International Journal of Information Management, the Journal of Global Information Management, the Journal of Information Technology, the Journal of Management Information Systems,
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
735
4. Size-dependent versus size-independent metrics 4.1. Context Many stated preference surveys ask about the prestige or reputation of journals. Others ask about their scholarly impact; their coverage of recent innovations in theory or methods; their usefulness for research, teaching, or practice; or their value as publication outlets for tenure and promotion. In still other cases, the key construct is never specified. As Weisheit and Regoli (1984) have noted, the lack of clear answers to the question “What is being measured?” is a major difficulty in the interpretation of stated preference ratings. In particular, a key question for nearly any citation metric—“Does this indicator represent the impact of the journal as a whole (all articles combined) or the impact of a typical article?”—is seldom addressed directly in stated preference studies. The first research question is intended to resolve at least some of these ambiguities, examining whether stated preference journal ratings are more closely related to size-dependent or size-independent citation metrics. Before proceeding with the correlation analysis, we should first examine whether the instructions to survey respondents refer specifically to journal characteristics, article characteristics, or both. (1) Six of the eight metrics shown in Table 2 appear to focus on typical articles. For instance, the survey by Nisonger and Davis (2005, p. 344) “asked respondents to rate a list of seventy-one journals. . .according to their perception of ‘how important publication in each journal is for promotion and tenure at your institution.”’ This presumably refers to articles rather than journals, since scholars present articles—not journals—in their applications for promotion and tenure. PAD-1 also emphasizes articles rather than journals: “Please assess each journal in terms of the general quality of the articles it publishes” (Curry, Van de Walle, & Gadellaa, 2014, p. 33). Likewise, the survey for SWK-1 asked respondents to “rate 38 social work journals [in terms of] overall quality” (Sellers et al., 2004, p. 147). Although the instructions mention journals rather than articles, the main construct (quality) is embedded in the individual articles. Admittedly, however, the interpretation of PAD-1 and SWK-1 is not entirely straightforward. (2) One metric, LIS-5, seems to refer to whole journals. Manzari (2013, p. 44) asked respondents to “rate a list of journals on a scale from 1 (low) to 5 (high) based on each journal’s importance to their research and teaching.” For LIS-5, respondents are likely to be thinking of entire journals rather than individual articles, since (a) Manzari specifically mentions journals and (b) the importance (utility) of each journal can be expected to vary with the number of articles it publishes. (3) CCJ-1 is ambiguous. Sorensen et al. (2006, p. 310) focus, ostensibly, on journals rather than articles, but their construct is not entirely clear; “Respondents were asked to ‘rate the following journals on a scale of 1 (poor) to 10 (excellent).”’ Sorensen et al. provided intentionally vague instructions because “the goal was to capture respondents’ overall impression of the prestige of the journal rather than a particular dimension (e.g., its utility).” As noted in Section 2, previous research provides some support for the idea that stated preference ratings are more closely related to size-independent metrics than to size-dependent metrics. For six of the eight metrics evaluated here, the instructions to respondents further support this expectation. 4.2. Citation metrics and comparisons We can evaluate the first research question by comparing the correlations between (a) each stated preference metric and an appropriate size-dependent citation metric, and (b) each stated preference metric and an appropriate size-independent citation metric. In choosing citation metrics, the goal was to find one or more pairs of metrics that differ only in the characteristic of interest (size dependence)—not in other attributes such as data source, cited-document window, or treatment of journal self-citations. The citation metrics available at six data download sites9 were each evaluated with regard to the criteria shown in Table 3. Two pairs of citation metrics are appropriate for evaluating the first research question. As Table 3 shows, the eigenfactor (EF) and article influence (AI) metrics are identical in all respects except that EF is size-dependent while AI is size-independent. Likewise, the total citations (3 years) (TC3) and CiteScore metrics are identical except that TC3 is sizedependent while CiteScore is size-independent. Having identified comparable citation metrics, we can evaluate whether (a) EF (size-dependent) or AI (size-independent) is more closely correlated with each stated preference metric, and (b) whether TC3 (size-dependent) or CiteScore (sizeindependent) is more closely correlated with each stated preference metric. Ideally, this procedure would result in a set of 16 comparisons—two for each stated preference metric. In practice, however, the four citation metrics are not available for
the Journal of the Association for Information Systems, and MIS Quarterly. ACM Transactions on Information Systems was excluded for the same reason. For more on the distinction between LIS and MIS journals, see Abrizah et al. (2013); Manzari (2013); and Odell and Gabbard (2008). 9 See JCR (Clarivate Analytics, 2017); Eigenfactor (University of Washington, 2017); CWTS Journal Indicators (Centre for Science and Technology Studies, 2017); SCImago Journal & Country Rank (SCImago Research Group, 2017); Scopus Journal Metrics (Elsevier, 2017); and Google Scholar Metrics (Google, 2017).
736
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
Table 3 Citation metrics included in the analysis of size-dependent and size-independent metrics. EF and AI data were downloaded from the Eigenfactor site (2004–2006) and JCR (2007–present). TC3 and CiteScore data were downloaded from Scopus Journal Metrics. Metric
SizeData source independent?
Earliest date for which data are available
Citationcounting window
Citeddocument window
Are selfcitations included?
Docs. for which citation are counted
Docs. that contribute to the article count
Normalized to account for disciplinary differences in citation impact?
Weighted to account for the impact of each citing journal?
EF
No
1997
1 year
5 years
No
All
—
Yes
Yes
AI
Yes
Web of Science Web of Science
1997
1 year
5 years
No
All
Citable
Yes
Yes
TC3 CiteScore
No Yes
Scopus Scopus
1999 2011
1 year 1 year
3 years 3 years
Yes Yes
All All
— All
No No
No No
Table 4 Correlations (Pearson’s r): stated preference ratings versus size-dependent and size-independent citation metrics—without transformations. Stated preference metric
Size-dependent
Citation metric
Size-independent
r
n
Citation metric
0.45
26
AI (2006)
EF (2005) EF (2005) EF (2005) EF (2005) EF (2013) TC3 (2013)
0.39 0.49 0.14 0.33 0.14 −0.08
44 44 44 44 52 64
PAD-1 (2014) PAD-1 (2014)
EF (2014) TC3 (2013)
0.73 0.73
SWK-1 (2004)
EF (2004)
0.15
CCJ-1 (2006)
EF (2006)
LIS-1 (2005) LIS-2 (2005) LIS-3 (2005) LIS-4 (2005) LIS-5 (2013) LIS-5 (2013)
Which citation metric better represents the stated preference metric? r
n 0.58
26
Size-independent
AI (2005) AI (2005) AI (2005) AI (2005) AI (2013) CiteScore (2013)
0.46 0.52 0.12 0.36 0.14 −0.01
44 44 44 44 52 64
Size-independent Size-independent Size-dependent Size-independent Size-independent Size-dependent
20 21
AI (2014) CiteScore (2014)
0.48 0.53
20 21
Size-dependent Size-dependent
30
AI (2004)
0.27
30
Size-independent
every year in which a stated preference survey was conducted. The initial correlations therefore include just 10 comparisons, representing the stated preference metrics for which appropriate citation data (for the year of the stated preference study, plus or minus two years) were available. As noted in Section 2, the relationships between stated preference ratings and citation metrics are not always linear (Christenson & Sigelman, 1985; Ellis & Durden, 1991). This can be seen in the scatterplots for several of the relationships examined here (i.e., LIS-3 versus AI). Rather than simply transforming the relationships that were visibly non-linear, however, it seemed preferable to undertake a second set of correlations in which each variable was transformed, if necessary, to achieve the best fit. This gives us an opportunity to evaluate each citation metric on a level playing field, allowing for the use of simple transformations whenever they might maximize the correlation coefficient. The second set of correlations is identical to the first except that each variable was transformed to achieve the best fit.10 Significance tests were not undertaken, since the sample is not random and the results cannot be extended to journals other than those included in the analysis.11 Each comparison is essentially a single case study, and the validity of the results can be seen in the extent to which clear patterns or relationships emerge despite differences in subject areas (CCJ, LIS, PAD, SWK), dates (2004–2014), sources of citation data (Web of Science, Scopus), and other characteristics. For instance, EF and AI are normalized to account for disciplinary differences in citation impact while TC3 and CiteScore are not. If sizeindependent metrics are more strongly (or less strongly) correlated with stated preference ratings, that relationship ought to be discernible for both EF/AI and TC3/CiteScore. 4.3. Results and discussion Table 4 presents the 10 initial comparisons—those for which none of the metrics were transformed. As the table shows, the size-independent citation metric (AI or CiteScore) is more closely related to the stated preference metric in 6 of the 10 cases. However, the size-dependent metric (EF or TC3) is more closely related to the stated preference metric in four cases.
10 For each correlation, seven transformations were evaluated: ln(x), ln(y), ln(x) and ln(y) together, sqrt (x), sqrt (y), 1/x, and 1/y. In several cases, the original (non-transformed) metrics yielded the best fit. 11 Howell (2013, pp. 287–288) describes a significance test that can be used to evaluate the differences between two correlation coefficients that are not independent (e.g., that have one variable in common). That test, devised by Williams (1959), would be appropriate if the journals were sampled randomly from defined populations. Also see Diedenhofen and Musch (2015, 2017).
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
737
Table 5 Correlations (Pearson’s r): stated preference ratings versus size-dependent and size-independent citation metrics—best-fit (transformed) relationships. In the transformation descriptions, x represents the citation metric and y represents the stated preference metric. Correlation coefficients based on 1/x or 1/y transformations were multiplied by −1 so that positive values represent direct (rather than inverse) relationships between the non-transformed variables. Stated preference metric
Size-independent
Size-dependent
Citation metric
Transformation
CCJ-1 (2006)
EF (2006)
sqrt (x)
LIS-1 (2005) LIS-2 (2005) LIS-3 (2005) LIS-4 (2005) LIS-5 (2013) LIS-5 (2013)
EF (2005) EF (2005) EF (2005) EF (2005) EF (2013) TC3 (2013)
PAD-1 (2014) PAD-1 (2014) SWK-1 (2004)
r
Which citation metric better represents the stated preference metric?
n
Citation metric
Transformation
r
n
0.46
26
AI (2006)
None
0.58
26
Size-independent
sqrt (x) sqrt (x) 1/x ln(x) ln(x), ln(y) none
0.39 0.49 0.17 0.35 0.20 −0.08
44 44 44 44 52 64
AI (2005) AI (2005) AI (2005) AI (2005) AI (2013) CiteScore (2013)
ln(x) ln(x) 1/x 1/x 1/x 1/y
0.55 0.59 0.28 0.46 0.29 0.11
44 44 44 44 52 64
Size-independent Size-independent Size-independent Size-independent Size-independent Size-independent
EF (2014) TC3 (2014)
ln(x), ln(y) none
0.80 0.73
20 21
AI (2014) CiteScore (2014)
ln(x), ln(y) ln(x)
0.58 0.55
20 21
Size-dependent Size-dependent
EF (2004)
ln(x), ln(y)
0.23
30
AI (2004)
sqrt (x)
0.29
30
Size-independent
Looking more closely, we can discern whether the results vary systematically by the characteristics of the stated preference metrics. Specifically, (1) The phrasing of the survey questions (Section 4.1)—the apparent emphasis on typical articles or whole journals—does not have a clear impact on the results. LIS-5, based on a survey question that emphasizes whole journals, is equally related to the size-dependent and size-independent metrics. Likewise, the PAD-1 question emphasizes typical articles rather than journals, but PAD-1 is more closely related to the size-dependent metrics than to the size-independent metrics. (2) The metrics that incorporate the responses of practitioners—CCJ-1, LIS-3, and LIS-4—are not systematically different from those based on the responses of deans or faculty. The PAD-1 results do suggest, however, that British and European respondents may be different from American respondents in their tendency to consider journal size when rating journals. (3) The year of the study may be a significant covariate. Size-independent metrics are more predictive than size-dependent metrics for five of the six studies published before 2007 but for just one of the four studies published thereafter. (4) Subject area may also be a factor. Size-dependent metrics stand out as the best predictors of stated preference ratings in just one of the four disciplines: public administration. (5) Size-dependent metrics are especially predictive when relatively few journals are examined. This is not a linear relationship, but a characteristic of PAD-1, in which only the top 30 journals (by impact factor) were evaluated by respondents. Together, these patterns suggest that Table 4 may be interpreted in at least two different ways. One interpretation is simply that the results are ambiguous, with a six-to-four split concerning the question of whether size-independent or sizedependent metrics are more closely related to stated preference ratings. A second interpretation is that the size-independent metrics are more closely related—except in the case of PAD-1, which is distinctive due to its recency, its subject scope, its British/European emphasis, and its inclusion of just the higher-impact journals. As noted in Section 4.2, signs of non-linearity were observed in several of the relationships shown in Table 4. To allow for the detection of curvilinear relationships, a second set of correlations was undertaken based on metrics that had been transformed, if necessary, to achieve the best fit. Those results are shown in Table 5. When transformations are allowed, the results are clearer; the size-independent citation metric (AI or CiteScore) is more closely related to the stated preference metric in 8 of the 10 cases. The size-dependent metric (EF or TC3) is more closely related to the stated preference metric in only two cases—the two PAD-1 comparisons. These new results match the earlier results with regard to the phrasing of the survey questions, the characteristics of the survey respondents, and the distinctiveness of PAD-1. Taken together, Tables 4 and 5 support the conclusion that stated preference journal ratings are more closely related to size-independent (typical article) citation metrics than to size-dependent (whole journal) metrics. These results are merely suggestive, however. They might be different if the stated preference studies had focused on the utility of each journal as a source of relevant research rather than the reputation or value of each journal as a publication outlet. Nonetheless, the tendency to view journals in terms of a typical article persists even when the instructions to survey respondents are somewhat ambiguous. These results are consistent with previous research—in particular, with the relationships shown in Table 1. This analysis also confirms that the relationships between citation metrics and stated preference ratings are not always linear (Christenson & Sigelman, 1985; Ellis & Durden, 1991). The PAD-1 results cannot be readily explained within this framework, although they may reflect patterns or practices unique to public administration. It is possible, for instance, that among the top-30 journals included in the PAD-1 survey, the larger journals are more familiar to respondents and therefore more likely to elicit favorable ratings. Alternatively, the PAD-1 results may reflect the British and European context of that particular study (Curry et al., 2014).
738
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
Table 6 Citation metrics included in the analysis of unweighted and weighted metrics. 5IF data were downloaded from JCR. AI data were downloaded from the Eigenfactor site (2004–2006) and JCR (2007–present). SNIP data were downloaded from CWTS. SJR data were downloaded from SCImago. Metric
Weighted to account for the impact of each citing journal?
Data source Earliest date for which data are available
Citationcounting window
Citeddocument window
SizeAre selfindependent?citations included?
Docs. for which citation are counted
Docs. that contribute to the article count
Normalized to account for disciplinary differences in citation impact?
5IF
No
2007
1 year
5 years
Yes
Yes
All
Citable
No
AI
Yes
Web of Science Web of Science
1997
1 year
5 years
Yes
No
All
Citable
Yes
SNIP SJR
No Yes
Scopus Scopus
1999 1999
1 year 1 year
5 years 5 years
Yes Yes
Yes Yesa
Citable Citable
Citable Citable
Yes Yes
a
Self-citations are limited to one-third of the total citation count.
Finally, most of the correlations reported in Tables 4 and 5 reveal only modest links between the stated preference ratings and the citation metrics, whether size-dependent or size-independent. For Table 4, the median magnitude of the correlations is 0.38; the maximum, 0.73. For Table 5, the median is 0.43; the maximum, 0.80. In none of the four subject areas are stated preference ratings and citation metrics closely and consistently related.
5. Unweighted versus weighted metrics 5.1. Context As noted in Section 1, unweighted citation metrics assign equal weight to each citation, regardless of the characteristics of the citing journal. In contrast, weighted metrics assign higher weights to citations that appear in more influential journals (González-Pereira, Guerrero-Bote, & de Moya-Anegón, 2010; Guerrero-Bote & de Moya-Anegón, 2012; West, Bergstrom, & Bergstrom, 2010; West et al., 2015). The use of weighted metrics is based on two assumptions: first, that citations in high-impact journals (or journals that are centrally placed within the citation network) ought to count for more; and second, that we should therefore explicitly consider the influence of the citing journal when calculating journal ratings. It is possible, however, to reject the second point even if we accept the first. Even if we accept that a citation in a top-rated journal is more important than a citation in a lesser journal, we still might not find it necessary to account for that fact when calculating the cited journal’s rating. After all, a citation in a top journal is especially likely to generate additional citations in subsequent papers—in subsequent rounds of citing behavior. For instance, being cited in Nature carries a built-in advantage that is reflected in the behavior of readers (authors) who are presumably (a) more likely to encounter the citation in Nature than in a lower-impact journal and (b) more likely to assume that a relevant paper cited in Nature is something they ought to consider citing as well. Conversely, a citation in a lower-ranked journal inherently counts for less because it is less likely to generate those additional citations. If the importance of a citation is inherent in the extent to which it leads to subsequent citations, one can argue that the desired outcome—more credit for citations in high-impact journals—prevails even when unweighted metrics are used. Although both unweighted and weighted metrics are in widespread use, there is no reliable guide to whether (or when) each type of metric is especially appropriate. The second research question addresses this problem, examining whether stated preference journal ratings are more closely related to unweighted or weighted citation metrics. The analysis does not consider which metric ought to be used, but it does examine whether unweighted or weighted metrics are more consistent with scholars’ subjective ratings.
5.2. Citation metrics and comparisons For the second research question, we will compare the correlations between (a) each stated preference metric and an appropriate unweighted citation metric, and (b) each stated preference metric and an appropriate weighted citation metric. Citation metrics were chosen using the procedure described in Section 4.2. Two pairs of citation metrics are appropriate for this analysis: 5IF/AI and SNIP/SJR. As Table 6 shows, 5IF and AI are identical in most respects other than their unweighted/weighted status. They do differ, however, in their treatment of self-citations and in the fact that AI is normalized to account for disciplinary differences in citation impact. For that reason, we cannot conclusively state that any
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
739
Table 7 Correlations (Pearson’s r): stated preference ratings versus unweighted and weighted citation metrics—without transformations. Stated preference metric
Unweighted
Citation metric
Weighted
r
Which citation metric better represents the stated preference metric?
n
Citation metric
r
n
CCJ-1 (2006) CCJ-1 (2006)
5IF (2007) SNIP (2006)
0.62 0.59
26 42
AI (2007) SJR (2006)
0.73 0.66
26 43
Weighted Weighted
LIS-1 (2005) LIS-1 (2005) LIS-2 (2005) LIS-2 (2005) LIS-3 (2005) LIS-3 (2005) LIS-4 (2005) LIS-4 (2005) LIS-5 (2013) LIS-5 (2013)
5IF (2007) SNIP (2005) 5IF (2007) SNIP (2005) 5IF (2007) SNIP (2005) 5IF (2007) SNIP (2005) 5IF (2013) SNIP (2013)
0.33 0.36 0.44 0.49 −0.04 0.26 0.23 0.39 0.07 0.13
43 53 43 53 43 53 43 53 53 72
AI (2007) SJR (2005) AI (2007) SJR (2005) AI (2007) SJR (2005) AI (2007) SJR (2005) AI (2013) SJR (2013)
0.40 0.41 0.50 0.52 0.03 0.22 0.29 0.40 0.14 0.30
43 54 43 54 43 54 43 54 52 72
Weighted Weighted Weighted Weighted Unweighted Unweighted Weighted Weighted Weighted Weighted
PAD-1 (2014) PAD-1 (2014)
5IF (2014) SNIP (2014)
0.54 0.78
20 21
AI (2014) SJR (2014)
0.48 0.67
20 21
Unweighted Unweighted
SWK-1 (2004)
SNIP (2004)
0.39
31
SJR (2004)
0.36
31
Unweighted
differences between the two metrics are attributable solely to their unweighted/weighted status. SNIP and SJR are a closer match, however. They are identical in all key respects, apart from their unweighted/weighted status.12 For nearly every stated preference metric shown in Table 2, we can evaluate, first, whether 5IF (unweighted) or AI (weighted) is more closely related to the stated preference metric; and, second, whether SNIP (unweighted) or SJR (weighted) is more closely related to the stated preference metric. Although missing data were a significant problem for the first research question, appropriate data were available for 15 of the 16 comparisons that address the second research question.13 As before, two sets of correlations were evaluated. The first set of correlations is based on the original metrics. The second set is based on metrics that were transformed to achieve the best fit, as described in Section 4.2. 5.3. Results and discussion Without transformations, the weighted citation metric (AI or SJR) is more closely related to the stated preference metric in 10 of the 15 cases. (See Table 7.) In contrast, the unweighted metric (5IF or SNIP) is more closely related to the stated preference metric in just five cases. This suggests that weighted metrics, which account for the impact of the citing journal, are more nearly consistent with respondents’ subjective ratings of journals. As with Tables 4 and 5, however, most of the correlations are modest (median magnitude = 0.40; maximum = 0.78). Table 7 reveals no systematic pattern over time, or with regard to the respondents’ faculty or non-faculty status. However, the unweighted metrics are more strongly correlated with the stated preference ratings in two subject areas: public administration and social work. As noted in Section 4.3, PAD-1 is distinctive due to its recency, its subject scope, its British/European emphasis, and its inclusion of only the higher-impact journals. Table 8, based on the transformed metrics, is fully consistent with Table 7. The weighted citation metric (AI or SJR) is more closely related to the stated preference metric in 13 of the 15 cases. This finding is well supported, since it persists despite differences in subject field, date, question wording, respondents’ characteristics, number of journals examined, data source, normalization for disciplinary differences, and weighting method. In particular, there is no stated preference metric for which the two unweighted metrics have an overall advantage over the two weighted metrics. These findings may be relevant to the assertion that unweighted metrics such as IF represent popularity while weighted metrics such as AI and SJR represent prestige (Bollen, Rodriguez, & Van de Sompel, 2006; Franceschet, 2010; Yan & Ding, 2010). As Ding and Cronin (2011, p. 80) have stated, “The popularity of a social actor [or journal] can be defined as the total number of endorsements (acclaim, applause, citations) received from all other actors, and prestige as the number of endorsements coming specifically from experts [highly rated journals].” Within this framework, weighted citation metrics can be regarded as expert endorsements because they assign more weight to citations that appear in the top journals. If the distinction between popularity (unweighted metrics) and prestige (weighted metrics) is regarded as a definition rather
12 With SNIP and SJR, different methods are used to account for disciplinary differences in impact (González-Pereira et al., 2010; Guerrero-Bote & de Moya-Anegón, 2012; Moed, 2010; Waltman et al., 2013). However, those methodological differences do not invalidate the essential similarities between the two metrics. 13 For five comparisons, the years do not match exactly, but in all cases the year of the citation metric is within two years of the date of the stated preference study.
740
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
Table 8 Correlations (Pearson’s r): stated preference ratings versus unweighted and weighted citation metrics—best-fit (transformed) relationships. In the transformation descriptions, x represents the citation metric and y represents the stated preference metric. Correlation coefficients based on 1/x or 1/y transformations were multiplied by −1 so that positive values represent direct (rather than inverse) relationships between the non-transformed variables. Stated preference metric
Unweighted
Weighted
Which citation metric better represents the stated preference metric?
Citation metric
Transformation
r
n
Citation metric
Transformation
r
n
CCJ-1 (2006) CCJ-1 (2006)
5IF (2007) SNIP (2006)
sqrt (x) None
0.62 0.59
26 42
AI (2007) SJR (2006)
None None
0.73 0.66
26 43
Weighted Weighted
LIS-1 (2005) LIS-1 (2005) LIS-2 (2005) LIS-2 (2005) LIS-3 (2005) LIS-3 (2005) LIS-4 (2005) LIS-4 (2005) LIS-5 (2013) LIS-5 (2013)
5IF (2007) SNIP (2005) 5IF (2007) SNIP (2005) 5IF (2007) SNIP (2005) 5IF (2007) SNIP (2005) 5IF (2013) SNIP (2013)
ln(x) None sqrt (x) None 1/x sqrt (x) ln(x), ln(y) sqrt (x) ln(x), ln(y) sqrt (x)
0.40 0.36 0.47 0.49 0.17 0.28 0.32 0.41 0.24 0.23
43 53 43 53 43 53 43 53 53 72
AI (2007) SJR (2005) AI (2007) SJR (2005) AI (2007) SJR (2005) AI (2007) SJR (2005) AI (2013) SJR (2013)
ln(x) sqrt (x) sqrt (x) sqrt (x) 1/x ln(x) ln(x), ln(y) ln(x), ln(y) 1/x ln(x), ln(y)
0.45 0.42 0.53 0.54 0.18 0.25 0.40 0.45 0.29 0.32
43 54 43 54 43 54 43 54 52 72
Weighted Weighted Weighted Weighted Weighted Unweighted Weighted Weighted Weighted Weighted
PAD-1 (2014) PAD-1 (2014)
5IF (2014) SNIP (2014)
ln(x), ln(y) None
0.57 0.78
20 21
AI (2014) SJR (2014)
ln(x), ln(y) None
0.58 0.67
20 21
Weighted Unweighted
SWK-1 (2004)
SNIP (2004)
None
0.39
31
SJR (2004)
1/x
0.46
31
Weighted
Table 9 Number of times each stated preference study mentions prestige, popularity, or reputation. All counts include variant forms of the words (prestigious, popular, etc.) but exclude occurrences in bibliographic references, running titles, and table titles that appear on the second and subsequent pages of tables. CCJ-1 (Sorensen et al., 2006) Prestige is mentioned once in the title, once in the abstract, and 23 times in the text. Popularity is mentioned just twice. Reputation is not mentioned. None of these terms were used in the instructions to respondents, although the authors state that they tried to capture each respondent’s “overall impression of the prestige of the journal” (p. 310). LIS-1, LIS-2, LIS-3, and LIS-4 (Nisonger and Davis, 2005Nisonger & Davis, 2005) Prestige is mentioned once in the abstract and 12 times in the text. The two main journal rating tables both refer to “Average rating of journal prestige in terms of value for tenure and promotion.” Popularity and reputation are not mentioned. None of these terms were used in the instructions to respondents with regard to LIS-1, LIS-2, LIS-3, or LIS-4. LIS-5 (Manzari, 2013) Prestige is mentioned once in the title, once in the abstract, and 20 times in the text. Popularity and reputation are not mentioned. None of these terms were used in the instructions to respondents except that an open-ended question, not used in the journal ratings, “asked for any comments about the prestige of library and information science journals” (p. 45). PAD-1 (Curry et al., 2014) Prestige is not mentioned. Popularity and reputation are each mentioned just once. The authors avoid all three words by referring to respondents’ perceptions or to the apparent quality of the various journals. Their survey asked respondents to rate the “general quality of the articles” published in each journal. SWK-1 (Sellers et al., 2004) Prestige is mentioned 51 times in the text. The two main journal-rating tables both refer to “prestige ratings” or “prestige rankings.” Moreover, the authors explicitly define prestige (the main rating for each journal) as overall quality times the square root of familiarity. Popularity is not mentioned. Reputation is mentioned once in the title and 19 times in the text, almost always in reference to “reputational” or “reputation-based” journal ratings (i.e., stated preference ratings). Reputation is contrasted not with prestige, but with impact as assessed through the use of citation data.
than a hypothesis, then there is no need for empirical support. At the same time, it may be helpful to evaluate whether the stated preference metrics that explicitly measure prestige (rather than popularity or reputation) are the ones most closely associated with weighted rather than unweighted citation metrics. As Table 9 shows, four of the five stated preference studies—all but Curry et al. (2014)—do focus on prestige as the key construct. Those four studies represent seven of the eight stated preference ratings. It is safe to conclude that the seven prestige-based ratings are more closely associated with weighted citation metrics than with unweighted citation metrics. This is a weak test of the underlying assertion, however, because the sample includes just one stated preference metric that does not focus on prestige (PAD-1) and because PAD-1 is about equally related to the unweighted and weighted citation metrics. The conclusion that prestige-based stated preference metrics are more closely related to weighted (rather than unweighted) citation metrics can perhaps be subsumed under the more general finding that the same relationship holds true for nearly all stated preference metrics, whether or not they focus on prestige as the central construct.
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
741
6. Conclusion This paper presents modest evidence that stated preference journal ratings are more closely related to size-independent citation metrics (those that represent the impact of a typical article) than to size-dependent citation metrics (those that represent the impact of the journal as a whole). It also shows, more conclusively, that stated preference journal ratings are more closely related to weighted citation metrics (those that account for the impact of the citing journal) than to unweighted citation metrics (those that do not account for the impact of the citing journal). However, these findings are based only on the stated preference studies shown in Table 2; they do not necessarily hold with regard to other journal ratings or other subject areas. Tables 4 and 5 suggest that respondents tend to adopt a size-independent view—to rate journals in terms of a typical article—even when the survey instructions do not specify that interpretation. There is no reason why the instructions cannot be made more clear, however. It is useful to discover how respondents behave when assigning journal ratings, but it would perhaps be more useful to explicitly incorporate the whole-journal/typical-article distinction into the design of stated preference surveys—to ensure that respondents know what is expected of them. A related distinction can be made between utility for readers, who are more likely to discover useful papers in the journals that publish more articles, all else equal; and utility for authors, whose promotion and tenure prospects are essentially unrelated to the size of the journals in which they publish. The results for unweighted and weighted citation metrics (Tables 7 and 8) can be considered with regard to the weighted metrics that have emerged over the past decade. Historically, one advantage of citation metrics has been the clarity of the central construct: scholarly impact. Impact factor (IF), for example, is both unambiguous and directly interpretable. The meaning of scholarly impact becomes less certain, however, when we control for factors such as disciplinary differences in impact or the relative impact of the citing journal. For instance, newer citation metrics such as EF, AI, and SJR are weighted so that citations in high-impact journals count for more than citations in lower-impact journals. Arguably, these newer metrics represent not just impact but a more complex construct—something more like “relative impact within a particular subject area, adjusted for the impact of the citing journal.” With the introduction of weights, some conceptual clarity is lost. However, these findings suggest that something is gained as well. If, as this study suggests, weighted citation metrics correspond more closely than unweighted metrics to the journal ratings assigned by subject experts, we can conclude that weighted metrics (1) demonstrate the convergent validity of both weighted metrics and stated preference metrics; (2) better incorporate the perspectives and judgments that respondents rely on, consciously or unconsciously, when assigning journal ratings; (3) raise the possibility that citation metrics or other revealed preference indicators might eventually be used to replicate the results of stated preference studies without the cost of large-scale surveys; (4) may help provide insight into the ways in which experts assess journal quality, especially with regard to poorly understood issues such as the assignment of journal ratings in the arts and humanities. Waltman and van Eck (2012, p. 406), noting the proliferation of citation metrics in recent years, have rightly asserted that new metrics should be developed only when they provide “clear added value relative to existing indicators.” However, we should also recognize that the benefits of a particular citation metric are not always apparent until after the metric has been made available for use and evaluation. 6.1. Limitations of the study This investigation is limited by its scope; it is based on data for just four subject areas in the applied social sciences. Different results might be obtained for the natural sciences, the humanities, or even for social science disciplines such as economics and psychology. Unfortunately, investigations of this type are more difficult in the humanities, where fewer journals are covered by the standard citation metrics; and in the natural sciences, where stated preference studies are less common, perhaps due to the widespread acceptance of indicators such as IF, SNIP, and SJR (Archambault & Larivière, 2010; Jacsó, 2011; Nederhof, 2006; Nisonger, 2004). Two further limitations can be traced to the availability of metrics and data. First, the analysis incorporates only eight stated preference ratings and seven citation metrics. Although the methodological differences among the stated preference metrics—differences in the respondent populations and the phrasing of the survey questions, for instance—appear to have only a modest impact on the results of the analysis, we cannot claim that the eight metrics shown in Table 2 represent the full range of methodological possibilities. This same limitation makes it impossible to formally control for the effects of covariates such as respondent group (faculty, deans, or practitioners) and data source (WoS, Scopus, or Google Scholar). As described in Section 4.2, we can minimize the effects of these extraneous factors by choosing variables that are comparable except for the characteristics of interest. However, we cannot evaluate the impact of each factor (e.g., data source) using the available metrics and data. A second data-related limitation is the exclusion of journals due to missing data. Although small sample size is a problem (Tables 4, 5, 7 and 8), a more serious difficulty is the potential for bias when lower-ranked journals are disproportionately
742
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
excluded from the analysis. (Presumably, the journals with modest reputations or more limited impact are less likely to be included in stated preference surveys and citation databases.) As a result, the findings of this study may not reflect the full range of variation in reputation and impact that exists within the complete set of journals in each subject area. Finally, the differences in magnitude between the correlations reported here—size-independent versus size-dependent, weighted versus unweighted—are often quite small. In Table 5, for instance, the average difference between the sizedependent and size-independent correlations is just 0.13. The claim that stated preference ratings are more closely related to size-independent metrics than to size-dependent metrics is therefore based not on the strength of the relationships shown in Table 5, but on their consistency (i.e., 8 of the 10 comparisons). Overall, these results must be regarded as suggestive rather than conclusive. 6.2. Further research As noted in Section 6.1, further research will allow us to evaluate the relationships between stated preference ratings and citation metrics for a broader range of subject areas, research methods, respondent groups, data sources, and journals. Although this analysis has not tested the assertion that unweighted metrics represent popularity while weighted metrics represent prestige (Bollen et al., 2006; Ding & Cronin, 2011; Franceschet, 2010; Yan & Ding, 2010), it does suggest further research along those lines. One strategy is to evaluate whether prestige-based stated preference metrics—those that focus specifically on prestige rather than other constructs—have an especially close connection to weighted citation metrics. A second strategy, based on the postulated link between prestige and expert endorsements, is to investigate the differences between the journal ratings of top scholars and those of other faculty. Further research might also examine the impact of timing on stated preference ratings. It is possible, for instance, that respondents rate journals based at least partly on the journals’ past reputations rather than their current characteristics. Likewise, respondents may be influenced by previous years’ citation metrics—the impact factors posted on publishers’ web sites, for instance. It is possible that the widespread availability of citation data in recent years has led to a closer relationship between citation metrics and stated preference ratings. Author contributions Conceived and designed the analysis: William H. Walters. Collected the data: William H. Walters. Contributed data or analysis tools: William H. Walters. Performed the analysis: William H. Walters. Wrote the paper: William H. Walters. Acknowledgements I am grateful for the comments of Esther Isabelle Wilder and three anonymous referees. Support for this research was provided by Manhattan College, Menlo College, Harris Manchester College, and Oxford University. References Abrizah, A., Zainab, A. N., Kiran, K., & Raj, R. G. (2013). LIS journals scientific impact and subject categorization: A comparison between Web of Science and Scopus. Scientometrics, 94(2), 721–740. Ahlgren, P., & Waltman, L. (2014). The correlation between citation-based and expert-based assessments of publication channels: SNIP and SJR vs. Norwegian quality assessments. Journal of Informetrics, 8(4), 985–996. Archambault, É., & Larivière, V. (2010). The limits of bibliometrics for the analysis of the social sciences and humanities literature. In F. Caillods (Ed.), World social science report: Knowledge divides (pp. 251–254). Paris, France: UNESCO Publishing. http://unesdoc.unesco.org/images/0018/001883/188333e.pdf Bartels, J. M., Magun-Jackson, S., Ryan, J. J., & Glass, L. A. (2008). A subjective and objective assessment of journals in educational psychology. Psychology and Education, 45(2), 29–35. Bollen, J., & Van de Sompel, H. (2008). Usage impact factor: The effects of sample characteristics on usage-based impact metrics. Journal of the American Society for Information Science and Technology, 59(1), 136–149. Bollen, J., Rodriguez, M. A., & Van de Sompel, H. (2006). Journal status. Scientometrics, 69(3), 669–687. Bollen, J., Van de Sompel, H., Hagberg, A., & Chute, R. (2009). A principal component analysis of 39 scientific impact measures. PLoS ONE, 4(6), e6022. http://dx.doi.org/10.1371/journal.pone.0006022 Cahn, E. S. (2014). Journal rankings: Comparing reputation, citation and acceptance rates. International Journal of Information Systems in the Service Sector, 6(4), 92–103. Calver, M. C., & Bradley, J. S. (2009). Should we use the mean citations per paper to summarise a journal’s impact or to rank journals in the same field? Scientometrics, 81(3), 611–615. Centre for Science and Technology Studies. (2017). CWTS journal indicators.. http://www.journalindicators.com/ Christenson, J. A., & Sigelman, L. (1985). Accrediting knowledge: Journal stature and citation impact in social science. Social Science Quarterly, 66(4), 964–975. Clarivate Analytics. (2017). Journal citation reports.. http://clarivate.com/scientific-and-academic-research/research-evalution/journal-citation-reports/ Cohn, E. G., & Farrington, D. P. (2011). Scholarly influence and prestige in criminology and criminal justice. Journal of Criminal Justice Education, 22(1), 5–11. Crookes, P. A., Reis, S. L., & Jones, S. C. (2010). The development of a ranking tool for refereed journals in which nursing and midwifery researchers publish their work. Nurse Education Today, 30(5), 420–427.
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
743
Curry, D., Van de Walle, S., & Gadellaa, S. (2014). Public administration as an academic discipline: Trends and changes in the COCOPS academic survey of European public administration scholars. Rotterdam, The Netherlands: Coordinating for Cohesion in the Public Sector of the Future. http://www.cocops.eu/wp-content/uploads/2014/02/COCOPS PAasadiscipline report 09.02.pdf Diedenhofen, B., & Musch, J. (2015). Cocor: A comprehensive solution for the statistical comparison of correlations. PLoS ONE, 10(4), e0121945. http://dx.doi.org/10.1371/journal.pone.0121945 Diedenhofen, B., & Musch, J. (2017). Cocor: Comparing correlations.. http://comparingcorrelations.org/ Ding, Y., & Cronin, B. (2011). Popular and/or prestigious? Measures of scholarly esteem. Information Processing & Management, 47(1), 80–96. Donohue, J. M., & Fox, J. B. (2000). A multi-method evaluation of journals in the decision and management sciences by US academics. Omega, 28(1), 17–36. Dul, J., Karwowski, W., & Vinken, J. (2005). Objective and subjective rankings of scientific journals in the field of ergonomics: 2004–2005. Human Factors and Ergonomics in Manufacturing, 15(3), 327–332. Ellis, L. V., & Durden, G. C. (1991). Why economists rank their journals the way they do. Journal of Economics and Business, 43(3), 265–270. Elsevier. (2017). Scopus journal metrics.. https://www.journalmetrics.com/ Franceschet, M. (2009). A cluster analysis of scholar and journal bibliometric indicators. Journal of the American Society for Information Science and Technology, 60(10), 1950–1964. Franceschet, M. (2010). The difference between popularity and prestige in the sciences and in the social sciences: A bibliometric analysis. Journal of Informetrics, 4(1), 55–63. Garand, J. C. (1990). An alternative interpretation of recent political science journal evaluations. PS: Political Science & Politics, 23(3), 448–451. Giles, M. W., & Garand, J. C. (2007). Ranking political science journals: Reputational and citational approaches. PS: Political Science & Politics, 40(4), 741–751. Goldstein, H., & Maier, G. (2010). The use and valuation of journals in planning scholarship: Peer assessment versus impact factors. Journal of Planning Education and Research, 30(1), 66–75. González-Pereira, B., Guerrero-Bote, V. P., & de Moya-Anegón, F. (2010). A new approach to the metric of journals’ scientific prestige: The SJR indicator. Journal of Informetrics, 4(3), 379–391. Google. (2017). Google scholar metrics. https://scholar.google.com/citations?view op=top venues Gorman, M. F., & Kanet, J. J. (2005). Evaluating operations management-related journals via the Author Affiliation Index. Manufacturing & Service Operations Management, 7(1), 3–19. Gorraiz, J., & Gumpenberger, C. (2010). Going beyond citations: SERUM—A new tool provided by a network of libraries. LIBER Quarterly, 20(1), 80–93. Guerrero-Bote, V. P., & de Moya-Anegón, F. (2012). A further step forward in measuring journals’ scientific prestige: The SJR2 indicator. Journal of Informetrics, 6(4), 674–688. Haddawy, P., Hassan, S.-U., Asghar, A., & Amin, S. (2016). A comprehensive examination of the relation of three citation-based journal metrics to expert judgment of journal quality. Journal of Informetrics, 10(1), 162–173. Haddow, G., & Genoni, P. (2010). Citation analysis and peer ranking of Australian social science journals. Scientometrics, 85(2), 471–487. He, C., & Pao, M. L. (1986). A discipline-specific journal selection algorithm. Information Processing & Management, 22(5), 405–416. Howell, D. C. (2013). Statistical methods for psychology (8th ed.). Belmont, CA: Wadsworth. Jacsó, P. (2011). The h-index, h-core citation rate and the bibliometric profile of the Scopus database. Online Information Review, 35(3), 492–501. Katerattanakul, P., Razi, M. A., Han, B. T., & Kam, H.-J. (2005). Consistency and concern on IS journal rankings. Journal of Information Technology Theory and Application, 7(2), 1–20. Lange, T. (2006). The imprecise science of evaluating scholarly performance: Using broad quality categories for an assessment of business and management journals. Evaluation Review, 30(4), 505–532. Lee, F. S., Cronin, B. C., McConnell, S., & Dean, E. (2010). Research quality rankings of heterodox economic journals in a contested discipline. American Journal of Economics and Sociology, 69(5), 1409–1452. Leydesdorff, L. (2009). How are new citation-based journal indicators adding to the bibliometric toolbox? Journal of the American Society for Information Science and Technology, 60(7), 1327–1336. Luke, R. H., & Doke, E. R. (1987). Marketing journal hierarchies: Faculty perceptions, 1986–87. Journal of the Academy of Marketing Science, 15(1), 74–78. Maier, G. (2006). Impact factors and peer judgment: The case of regional science journals. Scientometrics, 69(3), 651–667. Manzari, L. (2013). Library and information science journal prestige as assessed by library and information science faculty. Library Quarterly, 83(1), 42–60. Marsh, S. J., & Hunt, C. S. (2006). Not quite as simple as A–B–C: Reflections on one department’s experiences with publication ranking. Journal of Management Inquiry, 15(3), 301–315. Martin, B. R. (1996). The use of multiple indicators in the assessment of basic research. Scientometrics, 36(3), 343–362. Mason, P. M., Steagall, J. W., & Fabritius, M. M. (1997). Economics journal rankings by type of school: Perceptions versus citations. Quarterly Journal of Business and Economics, 36(1), 69–79. Miller, T. R., & Dodge, H. R. (1979). Ratings of professional journals by teachers of management. Improving College and University Teaching, 27(3), 102–103. Mingers, J., & Harzing, A.-W. (2007). Ranking journals in business and management: A statistical analysis of the Harzing data set. European Journal of Information Systems, 16(4), 303–316. Moed, H. F. (2010). Measuring contextual citation impact of scientific journals. Journal of Informetrics, 4(3), 265–277. Morris, H., Harvey, C., & Kelly, A. (2009). Journal rankings and the ABS Journal Quality Guide. Management Decision, 47(9), 1441–1451. Nederhof, A. J. (2006). Bibliometric monitoring of research performance in the social sciences and the humanities: A review. Scientometrics, 66(1), 81–100. Nederhof, A. J., Luwel, M., & Moed, H. F. (2001). Assessing the quality of scholarly journals in linguistics: An alternative to citation-based journal impact factors. Scientometrics, 51(1), 241–265. Nelson, T. M., Buss, A. R., & Katzko, M. (1983). Rating of scholarly journals by chairpersons in the social sciences. Research in Higher Education, 19(4), 469–497. Ni, C., Sugimoto, C. R., & Cronin, B. (2013). Visualizing and comparing four facets of scholarly communication: Producers, artifacts, concepts, and gatekeepers. Scientometrics, 94(3), 1161–1173. Nisonger, T. E. (2004). The benefits and drawbacks of impact factor for journal collection management in libraries. The Serials Librarian, 47(1–2), 57–75. Nisonger, T. E., & Davis, C. H. (2005). The perception of library and information science journals by LIS education deans and ARL library directors: A replication of the Kohl–Davis study. College & Research Libraries, 66(4), 341–377. Nixon, J. M. (2014). Core journals in library and information science: Developing a methodology for ranking LIS journals. College & Research Libraries, 75(1), 66–90. Nkereuwem, E. E. (1997). Accrediting knowledge: The ranking of library and information science journals. Library Review, 46(2), 99–104. Odell, J., & Gabbard, R. (2008). The interdisciplinary influence of library and information science 1996–2004: A journal-to-journal citation analysis. College & Research Libraries, 69(6), 546–564. Olson, J. E. (2005). Top-25-business-school professors rate journals in operations management and related fields. Interfaces, 35(4), 323–338. Polonsky, M. J., & Whitelaw, P. (2005). What are we measuring when we evaluate journals? Journal of Marketing Education, 27(2), 189–201. Rosenstreich, D., & Wooliscroft, B. (2009). Measuring the impact of accounting journals using Google Scholar and the g-index. British Accounting Review, 41(4), 227–239. Saarela, M., Kärkkäinen, T., Lahtonen, T., & Rossi, T. (2016). Expert-based versus citation-based ranking of scholarly and scientific publication channels. Journal of Informetrics, 10(3), 693–718. Schlögl, C., & Gorraiz, J. (2010). Comparison of citation and usage indicators: The case of oncology journals. Scientometrics, 82(3), 567–580.
744
W.H. Walters / Journal of Informetrics 11 (2017) 730–744
Schlögl, C., & Stock, W. G. (2004). Impact of relevance of LIS journals: A scientometric analysis of international and German-language LIS journals—Citation analysis versus reader survey. Journal of the American Society for Information Science and Technology, 55(13), 1155–1168. SCImago Research Group. (2017). SCImago journal & country rank.. http://www.scimagojr.com/ Seglen, P. O. (1997). Why the impact factor of journals should not be used for evaluating research. BMJ: British Medical Journal, 314(7079), 498–513. Sellers, S. L., Mathiesen, S. G., Perry, R., & Smith, T. (2004). Evaluation of social work journal quality: Citation versus reputation approaches. Journal of Social Work Education, 40(1), 143–160. Sellers, S. L., Mathiesen, S. G., Smith, T., & Perry, R. (2006). Perceptions of professional social work journals: Findings from a national survey. Journal of Social Work Education, 42(1), 139–160. Serenko, A., & Bontis, N. (2011). What’s familiar is excellent: The impact of exposure effect on perceived journal quality. Journal of Informetrics, 5(1), 219–223. Serenko, A., & Dohan, M. (2011). Comparing the expert survey and citation impact journal ranking methods: Example from the field of artificial intelligence. Journal of Informetrics, 5(4), 629–648. Shilbury, D., & Rentschler, R. (2007). Assessing sport management journals: A multi-dimensional examination. Sport Management Review, 10(1), 31–44. So, C. Y. K. (1998). Citation ranking versus expert judgment in evaluating communication scholars: Effects of research specialty size and individual prominence. Scientometrics, 41(3), 325–333. Sorensen, J. R. (2009). An assessment of the relative impact of criminal justice and criminology journals. Journal of Criminal Justice, 37(5), 505–511. Sorensen, J. R., Snell, C., & Rodriguez, J. J. (2006). An assessment of criminal justice and criminology journal prestige. Journal of Criminal Justice Education, 17(2), 297–322. Tahai, A., & Meyer, M. J. (1999). A revealed preference study of management journals’ direct influences. Strategic Management Journal, 20(3), 279–296. Tsai, C.-F. (2014). Citation impact analysis of top ranked computer science journals and their rankings. Journal of Informetrics, 8(2), 318–328. Turban, E., Zhou, D., & Ma, J. (2004). A group decision support approach to evaluating journals. Information & Management, 42(1), 31–44. University of Washington. (2017). Eigenfactor.org. http://www.eigenfactor.org/projects/journalRank/journalsearch.php Vanclay, J. K. (2008). Ranking forestry journals using the h-index. Journal of Informetrics, 2(4), 326–334. Walters, W. H. (2016a). Beyond use statistics: Recall, precision, and relevance in the assessment and management of academic libraries. Journal of Librarianship and Information Science, 48(4), 340–352. Walters, W. H. (2016b). Evaluating online resources for college and university libraries: Assessing value and cost based on academic needs. Serials Review, 42(1), 10–17. Walters, W. H. (2016c). Information sources and indicators for the assessment of journal reputation and impact. The Reference Librarian, 57(1), 13–22. Waltman, L., & van Eck, N. J. (2012). The inconsistency of the h-index. Journal of the American Society for Information Science and Technology, 63(2), 406–415. Waltman, L., van Eck, N. J., van Leeuwen, T. N., & Visser, M. S. (2013). Some modifications to the SNIP journal impact indicator. Journal of Informetrics, 7(2), 272–285. Weisheit, R. A., & Regoli, R. M. (1984). Ranking journals. Scholarly Publishing, 15(4), 313–325. West, J. D., Bergstrom, C. T., Althouse, B. M., Rosvall, M., Bergstrom, T. C., & Vilhena, D. (2015). About the Eigenfactor project. Seattle: University of Washington. http://www.eigenfactor.org/about.php West, J. D., Bergstrom, T. C., & Bergstrom, C. T. (2010). The Eigenfactor metrics: A network approach to assessing scholarly journals. College & Research Libraries, 71(3), 236–244. West, R., & McIlwaine, A. (2002). What do citation counts count for in the field of addiction? An empirical evaluation of citation counts and their link with peer ratings of quality. Addiction, 97(5), 501–504. Williams, E. J. (1959). The comparison of regression variables. Journal of the Royal Statistical Society, Series B: Methodological, 21(2), 396–399. Yan, E., & Ding, Y. (2010). Weighted citation: An indicator of an article’s prestige. Journal of the American Society for Information Science and Technology, 61(8), 1635–1643. Yan, E., Ding, Y., & Sugimoto, C. R. (2011). P-Rank: An indicator measuring prestige in heterogeneous scholarly networks. Journal of the American Society for Information Science and Technology, 62(3), 467–477. Zhang, C., Liu, X., Xu, Y. C., & Wang, Y. (2011). Quality-structure index: A new metric to measure scientific journal influence. Journal of the American Society for Information Science and Technology, 62(4), 643–653.