Axiomatics for the Hirsch index and the Egghe index

Journal of Informetrics 5 (2011) 476–480 Contents lists available at ScienceDirect Journal of Informetrics journal homepage: www.elsevier.com/locate...

Download PDF

156KB Sizes 3 Downloads 58 Views

Report

PDF Reader
Full Text

Journal of Informetrics 5 (2011) 476–480

Contents lists available at ScienceDirect

Journal of Informetrics journal homepage: www.elsevier.com/locate/joi

Short communication

Axiomatics for the Hirsch index and the Egghe index Antonio Quesada Departament d’Economia, Universitat Rovira i Virgili, Avinguda de la Universitat 1, 43204 Reus, Spain

a r t i c l e

i n f o

Article history: Received 11 November 2010 Received in revised form 25 January 2011 Accepted 26 January 2011 Keywords: Hirsch index Egghe index Citation analysis Axiomatization

a b s t r a c t The Hirsch index and the Egghe index are both numbers that synthesize a researcher’s output. The h-index associated with researcher r is the maximum number h such that r has h papers with at least h citations each. The g-index is the maximum number g of papers by r such that the average number of citations of the g papers is at least g. Both indices are characterized in terms of four axioms. One identiﬁes outputs deserving index at most one. A second one establishes a strong monotonicity condition. A third one requires the index to satisfy a property of subadditivity. The last one consists of a monotonicity condition, for the h-index, and an aggregate monotonicity condition, for the g-index. © 2011 Elsevier Ltd. All rights reserved.

1. Introduction This paper provides axiomatic characterizations of two indices that have been suggested to summarize a researcher’s output: the h-index by Hirsch (2005) and the g-index by Egghe (2006a, 2006b). The h-index of the output of a researcher r is the maximum number h of papers by r having at least h citations each. The g-index is the maximum number g of papers by r such that the average number of citations of the g papers is at least g. There already exist characterizations of the above two indices: Woeginger (2008a, 2008b) and Quesada (2009, 2010a, 2010b) axiomatize the h-index, whereas Woeginger (2008c) axiomatizes the g-index. Woeginger (2009) also characterizes the generalized g-index, obtained by replacing “at least g” in the above deﬁnition of g-index with “at least s(g)”, where s is a mapping from a class of mappings that includes the identity function. In a framework slightly different from the one adopted in those papers, Marchant (2009) characterizes the ranking induced by the h-index instead of the h-index itself (it is likely that the index could be characterized by just adding axioms, such as Z in Section 2, ascribing appropriate values to the index). Marchant’s framework is based on the deﬁnition of a binary relation over researchers, so that indices can be viewed as numerical representations of that relation. Adapting concepts of the consumer theory in economics, Burgos (2010) interprets that binary relation as a preference relation to suggest the idea of deﬁning indices as arising from the maximization of that preference. The added value of the axiomatizations presented here is that similar axioms characterize the two indices. Speciﬁcally, the difference between the h-index and the g-index can be essentially traced back to the choice between two versions of a monotonicity axiom. Proposition 3.3 characterizes the h-index in terms of the four axioms U, S, M, and SM. Proposition 3.7 characterizes the g-index in terms of the four axioms U , S, AM, and SM. SM is a condition of strong monotonicity stating that, in passing from output x to output y, the index increases if y has more papers than x and each paper in y receives more citations than the most cited paper in x. S is a subadditivity requirement: if output x can be decomposed into two outputs y and z, then the index of x is not greater than the sum of the index of y and the index of z. S captures the idea that the value of an output considered in

E-mail address: [email protected] 1751-1577/$ – see front matter © 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.joi.2011.01.009

A. Quesada / Journal of Informetrics 5 (2011) 476–480

477

isolation may differ from the value of the same output considered as part of a larger output. Hence, the marginal contribution of additional papers or citations depends on the papers and citations already accumulated. In particular, S expresses the view that the evaluation of outputs is subject to diminishing marginal productivity: the larger the output, the less value a citation or a paper adds to the output. U can be taken to be the condition that the index agrees with the h-index for outputs having h-index equal to one. U would be the same condition but with respect to the g-index. It can then be interpreted that U and U make the index adopt the same units as the h-index and the g-index, respectively. M (Woeginger’s monotonicity) and AM (aggregate monotonicity) are the two versions of monotonicity that separate the h-index from the g-index. M asserts that adding more citations or papers to an existing output does not lower the index. AM holds that the index is not reduced by adding more papers or citations to an existing output or by rearranging citations in such a way that the total number of citations, when computed from most to least cited papers, does not diminish. 2. Deﬁnitions and axioms Let N be the set of non-negative integers, N+ the set of positive integers, and ∅ the empty set. The linear ordering < on N is extended to N ∪ {∅} by setting, for all n ∈ N, ∅ < n. Deﬁne X to be the set of all vectors x = (x1 , . . ., xn ) such that n ∈ N+ , x1 ≥ x2 ≥ · · · ≥ xn , and, for all i ∈ {1, . . ., n}, xi ∈ N. A member x = (x1 , . . ., xn ) of X represents a research output consisting of n papers and, for each paper i, the number xi of citations that i receives. Deﬁne X = X ∪ {∅}, where the empty set designates the empty output. For x ∈ X, dx is the number of papers in x (or the dimension of x). For a rational number r, let int(r) be the integer part of r. For example, for r = 2/3, int(r) = 0; for r = 11/3, int(r) = 3; and, for r = 6/3 = 2, int(r) = 2. For x ∈ X and n ∈ {1, . . ., dx }, deﬁne cn x = x1 + x2 + · · · + xn to be the total number of citations received by the ﬁrst n papers in output x and let an x = int(cn x /n). The number an x represents the average number of citations of the ﬁrst n papers in output x when averaging is performed within the domain of the integers. For instance, if x = (9, 1, 1), then c1 x = 9, c2 x = 10, c3 x = 11, a1 x = 9, a2 x = 5, and a3 x = 3. For x ∈ X , let ax = (a1 x , a2 x , . . ., axd ). If x = ∅, then deﬁne ax = ∅. x

Deﬁnition 2.1.

A research output index (or index, for short) is a mapping f: X → N.

Deﬁnition 2.2. The h-index (or Hirsch index) is the index h such that, for all x ∈ X: (i) if dx = 0 or x1 = 0, then h(x) = 0; and (ii) otherwise, h(x) = max{n ∈ {1, . . ., dx }: xn ≥ n}. Deﬁnition 2.3. The g-index (or Egghe index) is the index g such that, for all x ∈ X: (i) if dx = 0 or x1 = 0, then g(x) = 0; and (ii) otherwise, g(x) = max{n ∈ {1, . . ., dx }: an x ≥ n}. Remark 2.4.

For all x ∈ X, g(x) = h(ax ).

The g-index can be viewed as an h-index applied to the vector ax of average citations rather than the original vector x of citations. To illustrate the differences between h and g, let x = (9, 4, 2, 1). In this case, ax = (9, 6, 5, 4), h(x) = 2, and g(x) = 4 = h(ax ). Deﬁnition 2.5. The binary relation < on X is such that: (i) for all x ∈ X, ∅ < x if and only if x1 > 0 and (ii) for all x ∈ X and y ∈ X , x < y if and only if dx < dy and x1 < ydy . In words, x < y if x is empty and y has some cited paper, or x has fewer papers than y and the least cited paper in y has more citations than the more cited paper in x. Deﬁnition 2.6. The binary relation ≤ on X is such that: (i) for all x ∈ X, ∅ ≤ x and (ii) for all x ∈ X and y ∈ X , x ≤ y if and only if dx ≤ dy and, for all i ∈ {1, . . ., dx }, xi ≤ yi . That is, x ≤ y if x is empty, or x does not have more papers than y and, for all i, the ith paper in x does not have more citations than the ith paper in y. It is worth noticing that < is not the strict part of ≤: it does not follow from x ≤ y and not (y ≤ x) that x < y. For instance, (2, 2) ≤ (3, 3, 2), it is not the case that (3, 3, 2) ≤ (2, 2), and it is neither the case that (2, 2) < (3, 3, 2). SM. Strong monotonicity. For all x ∈ X and y ∈ X, x < y implies f(x) < f(y). The fact that x < y suggests that output x is less valuable than output y in every respect: x has fewer papers than y and each paper in x has a fewer number of citations than the least cited paper in y. SM requires an index to be consistent with this idea: if x is worse than y in every respect, then f(x) < f(y). Deﬁne p0 = ∅ and, for n ∈ N+ , deﬁne pn to be the member of X having exactly n papers and such that every paper has n citations. Remark 2.7.

If f satisﬁes SM, then, for all n ∈ N, f(pn ) ≥ n.

That f(p0 ) ≥ 0 follows from the deﬁnition of index. Letting n ∈ N+ , p0 < p1 < · · · < pn . By SM, f(p0 ) < f(p1 ) < · · · < f(pn ). Since f takes values in N, {f(p0 ), f(p1 ), . . ., f(pn )} consists of n + 1 different non-negative integers. As a result, if f(pn ) = k < n, then k + 1 different non-negative integers must be assigned without repetition to n + 1 outputs, which is impossible. M. Monotonicity. For all x ∈ X and y ∈ X, x ≤ y implies f(x) ≤ f(y). M captures the idea that more citations or more papers cannot lower the index. Woeginger (2008a, 2008b, 2008c) makes an index satisfy M by deﬁnition. Though the h-index satisﬁes M, it does not satisfy the strict version of M: if dx < dy and, for all i ∈ {1, . . ., dx }, xi < yi , then f(x) < f(y). For example, h(2, 2) = 2 = h(3, 3, 2).

478

A. Quesada / Journal of Informetrics 5 (2011) 476–480

AM. Aggregate monotonicity. For all x ∈ X and y ∈ X, ax ≤ ay implies f(x) ≤ f(y). Given that x ≤ y implies ax ≤ ay , M can be seen as a weakening of AM. Whereas M depends on the values of the components of the vector representing the research output, AM depends on the sum of such values. In fact, AM is just a compact way of expressing the following: if dx ≤ dy and, for all n ∈ {1, . . ., dy }, cn x ≤ cn y , then f(x) ≤ f(y). Deﬁne V to be the set of all vectors x = (x1 , . . ., xn ) such that n ∈ N+ and, for all i ∈ {1, . . ., n}, xi ∈ N. For x ∈ V, x* is the member of X obtained from x by arranging its components in non-increasing order. For instance, if x = (0, 5, 2, 3, 5), then x* = (5, 5, 3, 2, 0). For x ∈ V and y ∈ V, x + y is the member z of V such that dz = max{dx , dy } and, for all i ∈ {1, . . ., dz }: (i) if i > dx , then zi = yi ; (ii) if i > dy , then zi = xi ; and (iii) if i ≤ dx and i ≤ dx , then zi = xi + yi . For example, if x = (0, 3, 2) and y = (5, 6), then x + y = (5, 9, 2). S. Subadditivity. For all x ∈ V and y ∈ V such that (x + y) ∈ X, f(x + y) ≤ f(x*) + f(y*). Consider any z ∈ X and any decomposition of z into two vectors x and y, whose components are non-negative integers, so that z = x + y. Then S requires that the index of z cannot be greater than the sum of the indices associated with x and y. For example, since (4, 4, 3, 1) = (1, 4, 2, 0) + (3, 0, 1, 1), assuming S implies f(4, 4, 3, 1) ≤ f(4, 2, 1, 0) + f(3, 1, 1, 0). S expresses the idea that it becomes harder to increase an index as the research output expands. S is related to the property of decreasing marginal contribution of citations and papers: the larger the output, then smaller the impact on the index caused by an additional paper or an additional citation. U. For all x ∈ X, if (dx = 1 and x1 ≥ 1) or (dx ≥ 2 and x2 ≤ 1), then f(x) ≤ 1. U . For all x ∈ X, if (dx = 1 and x1 ≥ 1) or (dx ≥ 2 and a2 x ≤ 1), then f(x) ≤ 1. U and U are conditions that select certain outputs that do not deserve more than index one. If the consequent “f(x) ≤ 1” is replaced by “f(x) = 1”, then U and U can be seen as providing the deﬁnition of the unit of measurement of the index. Since U and U may be regarded as ad hoc requirements, justiﬁcations are suggested next in terms of axioms Z, CM, and CM . Z. For all x ∈ X, if dx = 0 or x1 = 0, then f(x) = 0. CM. Converse strong monotonicity. For all x ∈ X such that f(x) > 0 and k ∈ {0, 1, . . ., f(x) − 1}, there is y ∈ X such that f(y) = k, dy < dx , and, if dy ≥ 1, for all i ∈ {1, . . ., dy }, y1 < xi . CM . Converse strong average monotonicity. For all x ∈ X such that f(x) > 0 and k ∈ {0, 1, . . ., f(x) − 1}, there is y ∈ X such that f(y) = k, dy < dx , and, if dy ≥ 1, for all i ∈ {1, . . ., dy }, a1 y < ai x . Z identiﬁes outputs deserving index 0: outputs without papers and outputs without citations. CM is, in essence, the converse of SM: if the index of x is positive, then, for every smaller index k, some output y has index k, has fewer papers than x and, if y is non-empty, then the most cited paper in y has fewer citations than each of the rst ﬁrst papers in x, with r being the number of papers in y. CM is CM with average citations considered instead of citations. Remark 2.8.

Z and CM imply U.

Let x ∈ X satisfy (dx = 1 and x1 ≥ 1) or (dx ≥ 2 and x2 ≤ 1). To show that f(x) ≤ 1, suppose not: f(x) > 1. By CM, for some y ∈ X, f(y) = f(x) − 1, dy < dx , and, for all i ∈ {1, . . ., dy }, y1 < xi . If dx = 1, then dy < dx implies y = ∅. By Z, f(y) = 0, so 0 = f(x) − 1: contradiction. If dx ≥ 2, then x2 ≤ 1 and y1 < x2 imply y1 = 0. By Z, f(y) = 0, so 0 = f(x) − 1: contradiction. Remark 2.9.

Z and CM imply U .

Let x ∈ X satisfy (dx = 1 and x1 ≥ 1) or (dx ≥ 2 and a2 x ≤ 1). To show that f(x) ≤ 1, suppose not: f(x) > 1. By CM , for some y ∈ X, f(y) = f(x) − 1, dy < dx , and, if dy ≥ 1, for all i ∈ {1, . . ., dy }, a1 y < ai x . If dx = 1 and x1 ≤ 1, then dy < dx implies y = ∅. By Z, f(y) = 0. Hence, 0 = f(x) − 1: contradiction. If dx ≥ 2 and a2 x ≤ 1, then taking y = ∅ leads to the same contradiction: by Z, f(x) − 1 = f(y) = 0. So consider y such that dy ≥ 1. Since a2 x ≤ 1, a1 y < a2 x means that y1 = 0. By Z, f(y) = 0. That is, f(x) − 1 = f(y) = 0: contradiction. 3. Results Lemma 3.1.

If an index f satisﬁes M and SM, then, for all x ∈ X, f(x) ≥ h(x).

Proof. Let x ∈ X and h(x) = h. If h = 0, then, by deﬁnition of index, f(x) ≥ 0 = h = h(x). If h > 0, then, by Remark 2.7, f(ph ) ≥ h. As h(x) = h, for all n ∈ {1, . . ., h}, xn ≥ h. Therefore, ph ≤ x. By M, f(ph ) ≤ f(x). Accordingly, f(x) ≥ h = h(x). Lemma 3.2.

If an index f satisﬁes U, S, and SM, then, for all x ∈ X, f(x) ≤ h(x).

Proof. For i ∈ N, deﬁne Hi = {x ∈ X: h(x) = i}. Part 1: for all x ∈ H1 , f(x) = h(x). If x ∈ H1 , then (dx = 1 and x1 ≥ 1) or (dx ≥ 2 and x2 ≤ 1). By U, f(x) ≤ 1. Since x ∈ H1 implies x1 ≥ 1, ∅ < x. By SM, f(∅) < f(x) ≤ 1. As f (∅) ∈ N, f(∅) = 0. Therefore, f(x) > f(∅) = 0 and f(x) ≤ 1 imply f(x) = 1 = h(x). Part 2: for all x ∈ H0 , f(x) = h(x). Let x ∈ H0 . This implies dx = 0 or x1 = 0. Let y ∈ X satisfy dy = dx + 1 and, for all i ∈ {1, . . ., dy }, yi = 1. By part 1, f(y) = 1. Since x < y, by SM, f(x) < f(y) = 1. It then follows from f (x) ∈ N that f(x) = 0 = h(x). Part 3: for all h ∈ N\{0, 1} and x ∈ Hh , f(x) ≤ h(x). Taking parts 1 and 2 as the base case of an induction argument, choose h ∈ N\{0, 1} and assume that, for all x ∈ H0 ∪ · · · ∪ Hh−1 , f(x) ≤ h(x). To prove that, for all x ∈ Hh , f(x) ≤ h(x), choose x ∈ Hh . Therefore, xh ≥ h and, if dx ≥ h + 1, xh+1 ≤ h. With k being the largest member of {1, . . ., dx } such that xk ≥ h, let y ∈ X be obtained from x by reducing to h − 1 the number of citations of papers h, h + 1, . . ., and k. Speciﬁcally, if k = h, then y = (x1 , . . ., xh−1 , h − 1, xh+1 , . . ., xdx ) and if k > h, then y = (x1 , . . ., xh−1 , h − 1, xh+1 − 1, . . ., xk − 1, xk+1 , . . ., xdx ). Hence, x = y + z, where z satisﬁes: (i) for all i ∈ {1, . . ., h − 1}, zi = 0; (ii) zh = xh − (h − 1); (iii) if k = h, then, for all i ∈ {h + 1, . . ., dx }, zi = xi ; and (iv) if k > h, then, for all i ∈ {h + 1, . . ., k}, zi = 1 and, for all i ∈ {k + 1, . . ., dx }, zi = xi . Clearly, y* = y and z* = (xh − (h − 1), 1, . . ., 1, 0, . . ., 0), where dz∗ = dx and

A. Quesada / Journal of Informetrics 5 (2011) 476–480

479

there are k − h components equal to 1 in z*. By S, f(x) ≤ f(y*) + f(z*) = f(y) + f(z*). Since h(y) = h − 1 and h(z*) = 1, by the induction hypothesis and part 1, f(x) ≤ f(y) + f(z*) ≤ (h − 1) + 1 = h = h(x). Proposition 3.3.

An index f satisﬁes U, S, M, and SM if and only if f = h.

Proof. “⇒” Lemmas 3.1 and 3.2. “⇐” It should not be difﬁcult to verify that h satisﬁes U and M. With respect to SM, suppose x < y. If x = ∅, then, by Deﬁnition 2.5(i), y1 > 0. That is, h(∅) = 0 and h(y) ≥ 1, so h(x) < h(y). If x = / ∅, then y = / ∅ and, by Deﬁnition 2.5(ii), dx < dy and x1 < ydy . Letting h(x) = h, it follows that h ≤ dx . Given that yh+1 > x1 , yh+1 > h. Hence, h(y) ≥ h + 1 = h(x) + 1. As regards S, assume that x ∈ V and y ∈ V are such that (x + y) ∈ X. Letting h(x*) = hx and h(y*) = hy , it must be shown that h(x + y) ≥ hx + hy . Since h(x*) = hx , the maximum number of papers in x with more than hx citations is hx . Now add y to x to obtain x + y. Having h(y*) = hy means that the maximum number of papers in y with more than hy citations is hy . Therefore, x + y cannot have hx + hy + 1 papers with more than hx + hy citations each. Proposition 3.4.

An index f satisﬁes Z, CM, S, M, and SM if and only if f = h.

Proof. By Remark 2.8 and Proposition 3.3, it is enough to show that h satisﬁes Z and CM. It is plain that h satisﬁes Z. With respect to CM, suppose h(x) > 0 and k ∈ {0, 1, . . ., h(x) − 1}. It must be shown that, for some y ∈ X, h(y) = k, dy < dx , and, if dy ≥ 1, for all i ∈ {1, . . ., dy }, y1 < xi . If h(x) = 1, then x1 ≥ 1 and k = 0, so y = ∅ satisﬁes h(y) = k and 0 = dy < dx . If h(x) > 1, then there are h(x) papers with at least h(x) citations each. Taking y = pk , h(y) = k. Since k < h(x), dy < dx . Finally, x is such that, for all i ∈ {1, . . ., h(x)}, xi ≥ h(x) > k. Hence, k < h(x) implies that, for all i ∈ {1, . . ., dy } = {1, . . ., k}, k = y1 < xi . Lemma 3.5.

If an index f satisﬁes AM and SM, then, for all x ∈ X, f(x) ≥ g(x).

Proof. Let x ∈ X and g(x) = g. If g = 0, then, by deﬁnition of index, f(x) ≥ 0 = g. If g > 0, then deﬁne y = pg . Suppose z ∈ X satisﬁes: (i) dz = dx ; (ii) for all i ∈ {1, . . ., g}, zi = g; and (iii) if dx > g, then, for all i ∈ {g + 1, . . ., dx }, zi = xi . Since, for all i ∈ {1, . . ., dy }, yi = zi , ay ≤ az . By AM, f(y) ≤ f(z). By Remark 2.7, f(y) ≥ g. Accordingly, f(z) ≥ g. As g(x) = g, for all i ∈ {1, . . ., g}, xi ≥ g. Therefore, for all i ∈ {1, . . ., g}, ai z ≤ ai x . Given this, for all i ∈ {g + 1, . . ., dx }, ai z ≤ ai x . As a result, az ≤ ax . By AM, g ≤ f(z) ≤ f(x). Lemma 3.6.

If an index f satisﬁes U , S, and SM, then, for all x ∈ X, f(x) ≤ g(x).

Proof. For i ∈ N, deﬁne Gi = {x ∈ X: g(x) = i}. Part 1: for all x ∈ G1 , f(x) = g(x). If g(x) = 1, then dx ≥ 1. If dx = 1, x1 ≥ 1, and if dx ≥ 2, x1 + x2 ≤ 3. In both cases, by U , f(x) ≤ 1. In addition, g(x) = 1 implies x1 ≥ 1, so ∅ < x. By SM, f(∅) < f(x) ≤ 1. As f (∅) ∈ N, f(∅) = 0. In sum, f(x) > f(∅) = 0 and f(x) ≤ 1 imply f(x) = 1 = g(x). Part 2: for all x ∈ G0 , f(x) = g(x). Let g(x) = 0. This implies dx = 0 or x1 = 0. Let y ∈ X satisfy dy = dx + 1 and, for all i ∈ {1, . . ., dy }, yi = 1. By part 1, f(y) = 1. Since x < y, by SM, f(x) < f(y) = 1. It then follows from f (x) ∈ N that f(x) = 0 = g(x). Part 3: for all g ∈ N\{0, 1} and x ∈ Gg , f(x) ≤ g(x). Taking parts 1 and 2 as the base case of an induction argument, choose g ∈ N\{0, 1} and assume that, for all x ∈ G0 ∪ · · · ∪ Gg−1 , f(x) ≤ g(x). To prove that, for all x ∈ Gg , f(x) ≤ g(x), choose x ∈ Gg . Therefore, ag x ≥ g and, if dx ≥ g + 1, ag+1 x ≤ g. It must be shown that f(x) ≤ g. Let k the largest member of {1, . . ., dx } such that xk = 0. Case 1. cg x = g2 . Case 1a: k > g. Let y ∈ X be obtained from x by removing one citation from every xi such that i ∈ {1, . . ., k − 1}. This removes k − 1 ≥ g citations from the ﬁrst k − 1 most cited papers in output x. Hence, x = y + z, where z satisﬁes dz = k − 1 and, for all i ∈ {1, . . ., k − 1}, zi = 1. Clearly, y* = y and z* = z. By S and part 1, f(x) ≤ f(y*) + f(z*) = f(y) + f(z) = f(y) + 1. By deﬁnition of y, cg y = cg x − g = g2 − g. Therefore, ag y = g − 1/g and g(y) < g. This and the induction hypothesis imply f(y) < g. Given that f(x) ≤ f(y) + 1, f(y) < g implies f(x) ≤ g. Case 1b: k ≤ g. Let y ∈ X be obtained from x by removing g citations from x1 . Thus, x = y + z, where z = (g). As z* = z, by S and part 1, f(x) ≤ f(y*) + f(z*) = f(y*) + f(z) = f(y*) + 1. By deﬁnition of y, cg y * = cg x − g = g2 − g, so ag y * = g − 1/g < g. Since k ≤ g, for all i ∈ {g + 1, . . ., dy* }, ci y * = cg y *. Hence, for all i ∈ {g + 1, . . ., dy* }, ai y * ≤ ag y * < g. Consequently, g(y*) < g. By the induction hypothesis, f(y*) ≤ g(y*). Accordingly, f(y*) < g. This and f(x) ≤ f(y*) + 1 imply f(y*) ≤ g. Case 2. cg x > g2 . This requires x1 > x2 . Case 2a: k > g. Let y ∈ X be obtained from x by removing two citations from x1 and one citation from every xi such that i ∈ {2, . . ., k − 1}. This removes k > g citations from the ﬁrst k − 1 most cited papers in output x. Thus, x = y + z, where z satisﬁes dz = k − 1, z1 = 2, and, for all i ∈ {2, . . ., k − 1}, zi = 1. As x1 > x2 , y* = y and z* = z. By S and part 1, f(x) ≤ f(y*) + f(z*) = f(y) + f(z) = f(y) + 1. By deﬁnition of y, cg y = cg x − g − 1. It follows from g(x) = g that g ≤ cg x /g < g + 1. That is, cg x < g2 + g. Accordingly, cg y < g2 − 1 and ag y < g − 1/g. This means that g(y) < g. By the induction hypothesis, f(y) < g. This and f(x) ≤ f(y) + 1 imply f(y) ≤ g. Case 2b: k ≤ g. Let y ∈ X be obtained from x by removing g + 1 citations from x1 . As a result, x = y + z, where z = (g + 1). Since z* = z, by S and part 1, f(x) ≤ f(y*) + f(z*) = f(y*) + f(z) = f(y*) + 1. By deﬁnition of y, cg y * = cg x − g − 1. Given that g(x) = g, cg x /g < g + 1. Equivalently, cg x < g2 + g. In view of this, cg y * = cg x − g − 1 implies cg y * < g2 + g − g − 1 = g2 − 1. In sum, ag y * < g − 1/g < g. As k ≤ g, for all i ∈ {g + 1, . . ., dy* }, ci y * = cg y * and, hence, ai y * ≤ ag y * < g. In view of this, g(y*) < g. By the induction hypothesis, f(y*) < g. This and f(x) ≤ f(y*) + 1 imply f(x) ≤ g. Proposition 3.7.

An index f satisﬁes U , S, AM, and SM if and only if f = g.

Proof. “⇒” Lemmas 3.5 and 3.6.“⇐” It should not be difﬁcult to verify that g satisﬁes U and AM. With respect to SM, / ∅, then y = / ∅ and, by Deﬁnition suppose x < y. If x = ∅, then, by Deﬁnition 2.5(i), y1 > 0. That is, g(∅) = 0 and g(y) ≥ 1. If x = 2.5(ii), dx < dy and x1 < ydy . Let g(x) = g. It follows from yg+1 > x1 that yg+1 > g, so g(y) ≥ g + 1. Hence, g(y) > g = g(x). As regards S, assume that x ∈ V and y ∈ V satisfy (x + y) ∈ X. Letting g(x*) = gx and g(y*) = gy , it must be shown that g(x + y) ≤ gx + gy . Since

480

A. Quesada / Journal of Informetrics 5 (2011) 476–480

g(x*) = gx , the largest set of papers in x with an average of more than gx citations has gx members. Therefore, each of the remaining dx − gx papers has, at most, gx citations. Similarly, having g(y*) = gy means that the largest set of papers in y with an average of more than gy citations has gy members, for which reason each of the remaining dy − gy papers has, at most, gy citations. Consequently, when x + y is formed, there cannot be a set of papers with more than gx + gy members and with an average number of citations larger than gx + gy . Proposition 3.8.

An index f satisﬁes Z, CM , S, AM, and SM if and only if f = g.

Proof. By Remark 2.9 and Proposition 3.7, it is enough to show that g satisﬁes Z and CM . It is plain that g satisﬁes Z. With respect to CM , let g(x) > 0 and k ∈ {0, 1, . . ., g(x) − 1}. It must be shown that, for some y ∈ X, g(y) = k, dy < dx , and, if dy ≥ 1, for all i ∈ {1, . . ., dy }, a1 y < ai x . If g(x) = 1, then x1 ≥ 1 and k = 0, so y = ∅ satisﬁes g(y) = k and 0 = dy < dx . If g(x) > 1, then there is a set of g(x) papers whose average number of citations is at least g(x). Letting y = pk , it follows that g(y) = k. Since k < g(x), dy < dx . Finally, x is such that, for all i ∈ {1, . . ., g(x)}, ai x ≥ g(x) > k. Hence, for all i ∈ {1, . . ., dy } = {1, . . ., k}, k = a1 y < ai x . Acknowledgements Financial support from the Secretaría de Estado de Investigación of the Spanish Ministerio de Ciencia e Innovación (research project SEJ2007-67580-C02-01) is gratefully acknowledged. Many thanks to the two reviewers for helpful comments and to the editor, Leo Egghe. References Burgos, A. (2010). Ranking scientists. Working paper, Departamento de Fundamentos del Análisis Económico, Universidad de Murcia. http://digitum.um.es/xmlui/bitstream/10201/10609/1/WPUMUFAE.2010.02.pdf. Egghe, L. (2006a). An improvement of the h-index: The g-index. ISSI Newsletter, 2, 8–9. Egghe, L. (2006b). Theory and practice of the g-index. Scientometrics, 69, 131–152. Hirsch, J. E. (2005). An index to quantify an individual’s scientiﬁc research output. In Proceedings of the National Academy of Sciences of the United States of America 102, (pp. 16569–16572). Marchant, T. (2009). An axiomatic characterization of the ranking based on the h-index and some other bibliometric rankings of authors. Scientometrics, 80, 325–342. Quesada, A. (2009). Monotonicity and the Hirsch index. Journal of Informetrics, 3, 158–160. Quesada, A. (2010a). More axiomatics for the Hirsch index. Scientometrics, 82, 413–418. Quesada, A. (2010b). Further characterizations of the Hirsch index. To appear in Scientometrics, Woeginger, G. J. (2008a). An axiomatic characterization of the Hirsch-index. Mathematical Social Sciences, 56, 224–232. Woeginger, G. J. (2008b). A symmetry axiom for scientiﬁc impact indices. Journal of Informetrics, 2, 298–303. Woeginger, G. J. (2008c). An axiomatic analysis of Egghe’s g-index. Journal of Informetrics, 2, 364–368. Woeginger, G. J. (2009). Generalizations of Egghe’s g-index. Journal of the American Society for Information Science and Technology, 60, 1267–1273.

Axiomatics for the Hirsch index and the Egghe index

Axiomatics for the Hirsch index and the Egghe index

Recommend Documents