Journal of Informetrics 14 (2020) 101015
Contents lists available at ScienceDirect
Journal of Informetrics journal homepage: www.elsevier.com/locate/joi
Regular article
Patent citation inflation: The phenomenon, its measurement, and relative indicators to temper its effects Ying Huang a,b , Lixin Chen c , Lin Zhang a,b,∗ a b c
School of Information Management, Wuhan University, China Centre for R&D Monitoring (ECOOM) and Department of MSI, KU Leuven, Belgium School of Information Technology, Shangqiu Normal University, China
a r t i c l e
i n f o
Article history: Received 30 November 2018 Received in revised form 16 January 2020 Accepted 17 January 2020 Keywords: Citation inflation Patent obsolescence Relative citation indicator Reference share Research evaluation
a b s t r a c t In recent decades, the United States Patent and Trademark Office (USPTO) has been granting more and more patents with more and more references, which has led to patent citation inflation. Citation counts are a fundamental consideration in decisions about research funding, academic promotions, commercializing IP, investing in technologies, etc. With so much at stake, we must be sure we are valuing citations at their true worth. In this article, we reveal two types of patent citation inflation and analyze its causes and cumulative effects. Further, we propose some alternative indicators that more accurately reflect the true worth of a citation. A case study on the patents held by eight universities demonstrates that the relative indicators outlined in this paper are an effective way to account for citation inflation as an alternative approach to evaluating patent activity. © 2020 Elsevier Ltd. All rights reserved.
1. Introduction In economics, it is well known that inflation reflects devaluation, i.e., that purchasing power decreases with excessive growth in the supply of money (Abel & Bernanke, 2005; Barro, 1997). However, when citation inflation occurs, does the same apply? Moreover, if so, how then can an idea’s value be accurately measured with citations? These are indeed important and interesting issues. Half a century ago, de Solla Price (1961, 1963) proposed that the number of scientific publications would grow exponentially. Although the law of exponential growth in science has not been proven to be entirely correct, scientific research publications are experiencing very rapid growth (Larsen & von Ins, 2010). Patents have undergone similar growth, as evidenced by the increase in patents granted by the United States Patent and Trademark Office (USPTO) over the past several decades. For example, in 1976, the USPTO granted around 70,000 patents, but, by 2015, that number had quadrupled to around 300,000. Although the external forces that drive patents differ from scientific publications in several respects – for example, economic incentives for R&D, competition, the need for market dominance, or the need to take public ownership of one’s ideas – both patents and scientific articles contain references that provide the potential approach to trace knowledge interactions and technology flows. Further, the mechanisms that generate the references within those outputs have some commonalities. References trace the provenance of knowledge and properly attribute the work of others (Criscuolo & Verspagen, 2008;
∗ Corresponding author. E-mail address: zhanglin
[email protected] (L. Zhang). https://doi.org/10.1016/j.joi.2020.101015 1751-1577/© 2020 Elsevier Ltd. All rights reserved.
2
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
Jaffe, Trajtenberg, & Fogarty, 2000). Authors, applicants, and patent examiners all conduct thorough searches for relevant references, often deeming an innovation to be novel when they do not find any (Cotropia, Lemley, & Sampat, 2013), and, today, the internet and information retrieval systems make it very easy to conduct exhaustive literature reviews that result in extensive reference lists. Cotropia (2009) and Sampat (2010) pointed out that an abundance of references can benefit patent applicants because their patents might earn the presumption of validity against the cited art in litigation since the examiners might not scrutinize each and every reference in an exhaustive list. Greater output and more references per unit of output would undoubtedly cause an explosion in references with the inevitable result of citation inflation. In this article, we have chosen to examine patent citation inflation, i.e., citation inflation in a corpus of patents. Although we cannot attest that all our conclusions will hold true for the academic literature, we do believe that some of our findings will apply. Moreover, we hope our overarching purpose shines through, which is to explore the efficacy of traditional research performance indicators given the phenomenon of citation inflation. Hence, for this reason and for brevity, we simply use the term citation inflation throughout this article. In recent decades, the references in patents have been used extensively to: trace the spread of technical knowledge (Chen & Hicks, 2004; Hu & Jaffe, 2003; Nelson, 2009; Park & Suh, 2013); chart technological trajectories (Epicoco, 2013; Érdi et al., 2013; Huang et al., 2017; Martinelli, 2012); map technical domains (Wang, Zhang, & Xu, 2011; Weng & Daim, 2012); and even to identify untapped opportunities for innovation (Albert, Avery, Narin, & McAllister, 1991; Lanjouw & Schankeman, 2004; Verspagen, 2000; von Wartburg, Teichert, & Rost, 2005). With so many studies relying on patents as a data source, the phenomenon of exploding references has not escaped attention. Hall, Jaffe, and Trajtenberg (2001) highlight that: “The combination of more patents making more citations suggests a kind of citation ‘inflation’ that may mean that later citations are less significant than earlier ones.” They also proposed several methods of tempering the effects of citation inflation (Hall, Jaffe, & Trajtenberg, 2000; Hall et al., 2001; Jaffe & Lerner, 2001; Jaffe & Trajtenberg, 1999). Persson et al. (2004) studied inflation in fundamental bibliometric indicators from the perspective of scientific collaboration, including citations counts, and concluded that relative indicators are needed to guarantee the validity of findings drawn from bibliometric results. Such calls also have peaked concern over inequitable journal impact factors as a result of citation inflation – especially self-citations (Chorus & Waltman, 2016; Heneberg, 2014, 2016; Ioannidis, 2015; Opthof, 2013). Correctly evaluating the worth of citations is an essential issue because, in the same way that we cannot directly compare the box office takings of a movie released in 1939 with one released in this year, directly comparing the citations received by two pieces of research produced decades apart is just as absurd. Quantitative science & technology evaluation requires measures that are transparent, relatively simple, and free of any bias due to time, discipline, or field. In a very recent work, Petersen, Pan, Pammolli, and Fortunato (2019) provided a solution to the paper citation inflation that arises from growth in scientific publications and/or the increasing length of reference lists. However, patent citation inflation has not yet been addressed from a systematic perspective. With the above problems in mind, we set ourselves three goals: 1) To conduct an in-depth exploration of the phenomena of citation inflation using patents as a corpus; 2) To identify the different types of citation inflation and devise appropriate methods for measuring the impact of each; 3) To develop relative indicators that more accurately reflect the worth of citation counts by properly accounting for inflation. 2. Data trends and observations 2.1. Data collection As an important background note, throughout this paper, we make a distinction between “citations” and “references”: a reference is given, and a citation is received. To help keep this distinction in mind, think of a reference list in a paper that gives credit to the work of other scholars, while citation counts reflect the acknowledgments we as scholars receive for our work. Our corpus for analysis consisted of USPTO utility patents granted between 1976 and 2015. To assemble the data, we downloaded the entire text of the patent from the USPTO website, including its reference list, and removed any references to non-USPTO patents or other sources of literature. The final dataset contained 5.27 million utility patents granted between 1976 and 2015, with 70.99 million references and 59.82 million citations. (The other 11 million references were given to USPTO patents granted prior to 1976). 2.2. The symptoms of citation inflation Our first step was to chart the number of granted patents, the citations received, and the references given broken down by year, as shown in Fig. 1. The number of patents granted increased by 3.79 % per year on average, and each patent contained an average of 3.63 % more references. This is an explicit representation of the reference explosion caused by an increase in both the number of patents granted and the average number of references they include. There is a common understanding that older patents should have received more citations simply by virtue of time, but that the average number of citations received per year for most patents should follow the same basic pattern, i.e., substantial increases in citation counts for the first five years after publication that gradually taper to fewer and fewer citations beyond five years. This is the theory of obsolescence (Brookes, 1970; Burton & Kebler, 1960; de Solla Price, 1965; Gosnell, 1944;
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
3
Fig. 1. The number of patents granted, references given, and citations received per year (millions).
Fig. 2. Average annual citations for patents granted in selected years over the period.
Griffith, Servi, Anker, & Drott, 1979; Marton, 1985). However, contra to this conception, our analysis shows that the theory of obsolescence no longer strictly holds true. Patents no longer share the same year-by-year pattern of citations as they age, nor are the citation counts of the same magnitude. The pace of obsolescence is slowing down, and patents granted more recently are receiving citations at orders of magnitude higher than their ancestors. Looking at magnitude first, Fig. 2 takes six groups of patents, granted five years apart across the period, and shows the average number of citations received each year afterward. The resulting chart is an explicit indication that recent patents are receiving more citations than older ones. As a case in point, in 1986, the average ten-year-old patent could expect to receive 0.266 citations that year. Whereas, in 2011, a ten-year-old patent could expect to receive 1.758 citations. That is almost a 600 % increase in citations in just 25 years and a clear symptom of citation inflation. The second symptom is that the pace of obsolescence is slowing down. Fig. 3 shows the half-life of patent references (not citations) measured with the asynchronous method.1 Clearly, the half-life of patent references is becoming longer and longer. In other words, the average age of the technology being referenced is growing older and older. For instance, in 2000, the average reference was to an 8-year-old patent. In 2015, this number had shifted back along the scale to an average of 14 years. As mentioned, the theory of obsolescence states that older patents should gradually be referenced fewer and fewer times. This is still the case, but what constitutes “old” is changing: patents are not becoming obsolete as quickly as expected. This is another clear impact of citation inflation.
1 There are two methods of measuring half-life: diachronous and asynchronous. The diachronous method measures the half-life of citations. The asynchronous method measures the half-life of references.
4
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
Fig. 3. The number of references given in selected years to patents by age.
Overall, our analysis of the past 40 years shows four clear indications of citation inflation: the number of patents granted each year is increasing; patents contain more references on average; recent patents are receiving more citations than older patents; and the pace of obsolescence is slowing down. 3. Measurement: two types of citation inflation In economics, inflation relates to devaluation caused by an increase in the money supply, which leads to a general rise in the price of goods and services and, in turn, results in a loss of currency value. In this study, citation inflation refers to a devaluation of citation counts caused by an increased supply of patents and/or references. However, before we set out to measure citation inflation, we need to be aware that there are two types of citation inflation: absolute and comparative. Absolute citation inflation considers that patents granted in certain year may have received more citations from one year/period than from other year/periods. Comparative citation inflation concerns that patents granted in certain year that have received relatively more citations than patents granted in another year in a subsequent year/period. In other words, comparative citation inflation reflects changes in the number of citations two different patents receive over the same length of time. The idea here is to ensure an “apples for apples” comparison. For the sake of simplicity, the above definitions are couched in the singular, but note that both kinds of inflation can be assessed for groups of patents. Returning to our example of the movie box office provides a useful illustration of the difference between the two. The 1997 film “Titanic” earned $44 million in 1998 in China but $145 million when it was re-released in 2012. The increase between these two years can be viewed as a kind of absolute inflation. Alternatively, comparative citation inflation can be likened to the 2019 film “Avengers: Endgame,” which has earned $2.796 billion in the last six months compared to “Avatar,” which has earned $2.789 billion since 2009. When accounting for the price of a movie ticket and the population growth between 2009 and 2019, “Avatar” has performed better than “Avengers: Endgame.” Such a comparison between films released in different years can be treated as the phenomenon of comparative inflation. 3.1. Absolute citation inflation Citation inflation is caused by two factors: an increase in the number of patents granted (PGR) and a general increase in the number of references within each patent (RGR). As with all increases, these factors are measured over a period. In the equations below, this period is defined as being from the year n to the year m. PGRm,n =
Nm − Nn Nm = −1 Nn Nn
(1)
RGRm,n =
Rm − Rn Rm = −1 Rn Rn
(2)
where Nm and Nn (Rm and Rn ) represent the sum of patents granted (references given) in the year m and n. We charted these growth rates over the full period of the corpus, as shown in Fig. 4. What is immediately apparent is that the growth in patents and the growth in references have almost the same fluctuations – the number of references rises and falls almost directly in line with the number of patents. If a marked increase in the number of references per patent were the main contributor, the two trends would not be moving in parallel; rather, the reference line would be showing its own independent growth pattern. Our result to the contrary demonstrates that the main cause of citation inflation has been growth in the number of patents. Another interesting observation from this rate-of-change perspective is that many
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
5
Fig. 4. The annual growth rate of patents granted and references given during 1977–2015.
years have experienced growth, and some substantially so at greater than 30 %. Thus, inflation has occurred seriously and frequently over this 40-year period. In straightforward terms, absolute citation inflation can be measured with the following equation: Cm,q − Cn,q Cm,q = −1 Cn,q Cn,q
CGRm,n,q =
(3)
where Cm,q represents the number of citations received by patents granted in year q from patents granted in year m. For example, and rounding for simplicity, if patents granted in 1976 received 27,962 citations in 2013 versus 28,566 citations in 2014, patents granted in 1976 suffered citation inflation of 0.22 %, i.e., CGR2014,2013,1976 = (C2014,1976 /C2013,1976) )−1=(27,962/28,566)−1 = 0.0216, as a result of an increase in the number of references supplied in 2014. At face value, Eq. (3) indicates inflation. However, as we know, patents naturally obsolesce and the decrease in citations could merely be due to time passing. What may be an explicit indication of patent obsolescence is when older patents receive a greater number of citations during periods of citation/reference inflation despite the decreased probability of them being referenced. Therefore, a modified citation growth rate is needed that accounts for obsolescence, i.e., OCGR: OCGRm,n,q =
Cm,q 1+On−q,m−q
Cn,q
− Cn,q
=
Cm,q −1 (1 + On−q,m−q )Cn,q
(4)
Here, On−q,m−q represents the rate of obsolescence for patents granted in year q, which works as a function of the rate of change in the share of patents being referenced from age n−q to m−q. A relative indicator of reference share can be used to eliminate the bias caused by citation/reference inflation when measuring patent obsolescence. Reference share is calculated as a ratio of the number of times of a patent was cited to the sum of all references given with the following equation: Sm,q =
Cm,q Rm
(5)
where Sm,q denotes the reference share of a patent granted in year q in the year m, Cm,q is the number of citations the patent received in year m, and Rm is the number of references provided by all patents granted in the year m. This indicator reflects a more reasonable measure of citations since it removes the bias introduced by citation inflation. Calculating the patent obsolescence rate can be done by measuring the rate of change in the reference share using the following equation: Oa,b =
Sb − Sa S = b −1 Sa Sa
(6)
where, Oa,b is the obsolescence rate of patents aged a to b, and Sa is the average share of the patents aged a being cited. In a given period from year n to m, Sa is calculated with the following equation: Sa =
m C n i,i−a m n
Ri
(7)
where Ri is the number of references provided in year i, and Ci,i−a is the number of times patents aged a in the year i have been cited (i.e., granted in year i − a).
6
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
Fig. 5. The annual citation growth rates of patents in selected periods. Note: “patents” and “references” denote the annual growth rates of patents and their references.
Using the references in USPTO patents granted during 1976–2015, we calculated the average reference share for patents of different ages (i.e., the average probability density of being cited). The corresponding values are shown in Table A3. The average reference share indicates that patents gradually obsolesce with age, demonstrating that the patent obsolescence rate can be calculated according to the average reference share. In the table, Sm,q signifies the reference share of an individual patent and Sa signifies the average reference share of all patents aged a. Further, the same example as above can be used to more clearly demonstrate how obsolescence is taken into account. The average citation count for a patent granted in 1976 should have decreased by 5.8 % from 2013 to 2014 due to obsolescence (according to Eq. (6) and Table A3, On−q,m−q = O2013-1976,2014-1976 =O37,38 = −0.053). Hence, the actual inflation rate from 2013 to 2014 for patents issued in 1976 is 7.9 %, i.e., OCGR(2014,2013,1976) = (C2014,1976 )/[(1+ O2013-1976,2014-1976 )*C2013,1976 ] − 1 = 28,566/[(1 − 0.053)*27,962]−1 = 0.079. When an old patent has received more citations in recent years than during its traditional “prime”, this is a strong signal that significant citation inflation has or is occurring. The annual citation growth rates of patents granted in different years during 1980–2015 are shown in Fig. 5. The analysis begins five years after the patents have been granted because it is normal for a patent’s citations to gradually increase over the first five years (Burton & Kebler, 1960; de Solla Price, 1965; Marton, 1985), so any increase in citations is likely not a reflection of citation inflation. Both Fig. 5a and b show that the annual growth rates of patents, references, and citations all have similar fluctuations. The results also show that citation inflation and deflation have both occurred frequently, but inflation has been more pronounced than deflation. However, when comparing Fig. 5a, which does not take obsolescence into account, and Fig. 5b, which does, the modified growth rates are generally higher and more consistent with the annual growth rates of patents and references. 3.2. Comparative citation inflation Comparative citation inflation considers that patents granted in a certain year have received relatively more citations than patents granted in another year in a subsequent year/period. Here, the rate of change in the average number of citations as of year m needs to be calculated. Additionally, the citation window needs to be normalized to avoid the inherent bias toward older patents given the more time they have had to accumulate citations. The equation is: CCGR(m − p)p,q = AC m,q =
m q
AC (m−p)+p,p AC m,p −1= −1 AC m−p+q,q AC (m−p)+q,q
AAC i,q =
m q
Ci,q /Nq
(8) (9)
where CCGR(m–p) (p,q) denotes the comparative citation growth rate for a group of patents granted in one year compared to another year with an (m−p)-year citation window. ACm,q is the average number of citations received by patents granted in year q as of year m; AAC i,q is the average number of citations received by patents granted in year q from patents granted in year i; and Ci,q is the number of citations received by patents granted in year q from patents granted in year i. As Fig. 4 shows, the most serious citation inflation occurred in 2010, alongside severe patent inflation (31.28 %) and severe reference inflation (41.07 %). This confluence of factors resulted in comparative inflation. Therefore, 2010 is an interesting year to examine in some microscopic detail, as illustrated in Fig. 6. The growth rate of the average number of citations received from patents granted from the year 2009–2010 (AACGR 2010) ranged from 0.324 to 0.553. (The spread of citations from min to max is because different patents have different capabilities to receive citations.) When reference inflation is occurring, patents granted in more recent years can accumulate citations more quickly than older patents and over a very short time (see Fig. 2). This is a key impact of comparative citation inflation.
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
7
Fig. 6. The annual average number of citations (AAC) received in 2009 vs. in 2010 and its growth rate.
Fig. 7. The average citation numbers in 2015 (AC2015,q ) and comparative citation growth rates comparing the patents granted in two adjacent years with a 10-year citation window (CCGR10q,q−1 ).
The frequency with which a patent receives citations is related to its age, and comparative inflation is not only determined by citation inflation but is also affected by patent age and obsolescence. As Fig. 7 shows, in 2015, 18-year-old patents (granted in 1998) had accumulated the greatest number of citations, reaching an average of 22.71, followed by older patents with fewer citations, and new patents with the fewest. This demonstrates that comparative citation inflation has a “sweet spot”. Old patents are less likely to attract citations due to obsolescence. The same is true for new patents, but in reverse: they are less likely to accumulate citations due to the short time window. 4. Indicators: replacing traditional indicators with relative indicators 4.1. The cumulative effects of citation inflation To mitigate the effects of citation inflation, it is necessary to consider the causes of citation inflation. Compared to absolute patent citation inflation, the causes of comparative inflation are less transparent, so this is where we have focused our attention. Fig. 8 shows the growth in the average number of citations grouped by the year the patent was granted. For example, as of 2015, patents granted in 1976 had received an average of 11.52 citations over 40 years, whereas patents granted in 1998 had received 22.71 citations over 22 years. The graph clearly indicates that citation counts are increasing more quickly as time progresses. However, it also demonstrates that comparative citation inflation over the period has occurred. Delving deeper, the citation distribution in 5-year intervals during 1976–2015 is listed as Table 1. These statistics show that new citations received in the period 2011–2015 account for almost 41.9 % of the total citations received, and almost half of those (45.4 %) were received by patents aged between 10 and 19 years old. So, if old patents have received almost half of
8
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
Fig. 8. The annual growth rings according to the average citation numbers by patents granted in year q as of year m (ACm,q ). Note: The angular coordinate denotes the year q of the cited patent was granted. The colored lines indicate the boundary year m of the citations received from inside to outside in ascending order. (The outermost yellow line is 2015, which represents citations received during the period 1976–2015). Table 1 Citation distribution in 5-year intervals during 1976–2015. Period patent granted 1976–1980 1981–1985 1986–1990 1991–1995 1996–2000 2001–2005 2006–2010 2011–2015 Total
The share of citations received in that period
Total
1976–1980
1981–1985
1986–1990
1991–1995
1996–2000
2001–2005
2006–2010
2011–2015
0.003
0.009 0.004
0.009 0.013 0.006
0.008 0.011 0.022 0.009
0.008 0.012 0.024 0.039 0.016
0.008 0.011 0.021 0.036 0.061 0.018
0.008 0.010 0.020 0.035 0.074 0.072 0.015
0.003
0.013
0.028
0.049
0.099
0.155
0.234
0.011 0.015 0.028 0.048 0.099 0.113 0.080 0.025 0.419
0.063 0.075 0.121 0.167 0.250 0.204 0.095 0.025 1.000
their citations in the last five years, citation inflation must be occurring, and the theory of obsolescence does not hold. When comparative citation inflation occurs, the pace of patent obsolescence slows down, with “teenage” patents as the biggest beneficiary. 4.2. Alternative approaches to relative citation indicators used for patent evaluation Jaffe and his collaborators proposed two methods to account for citation inflation (Hall et al., 2001; Henderson, Jaffe, & Trajtenberg, 1998; Jaffe & Trajtenberg, 1999). One is a fixed-effects approach that simply divides the citation count for one patent by the average counts over a group of patents (Eq. (10), Table 2). The other is a quasi-structural approach that draws
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
9
Table 2 Relative citation indicators used for patent evaluation. Equation
Description
Equation
(10)
Fixed-effects: citation count vs. mean
AC = C/
(11)
Cumulative reference share: the accumulated ratio of annual citations vs. annual references
CR =
m q
(C i /Ri ) =
m
(12)
Cumulative share (referring) patents: the accumulated ratio of annual citations vs. annual patents granted
(13)
Citation lifecycle: the accumulated ratio of the annual citations vs. the annual mean
CL =
(14)
Average citation lifecycle
ACL =
CN =
q
m q
Parameter
m
(C i /Ni )
Si−q (C i /i )
m Si−q (C /i ) i q m q
q
(si,q )
Citation count: C Mean: = AC (Eq. (9)) Annual citation count: Ci Annual reference count: Ri Reference share: Si,q (Eq. (5)) Ci The number of patents: Ni (referring patent share) Ci Annual mean: i =AAC (Eq. (9)) Average reference share: Si−q (Eq. (7)) Ci , i , Si−q
Si−q
Table 3 Examples of the relative citation indicators used for patent evaluation. Indicator
Fixed-effects
Cumulative reference share
Cumulative share referring patents
Citation lifecycle
Average citation lifecycle
Example (A vs. B)
Example (C vs. D)
AC (A) =65/18.0 = 3.62; AC (B) =65/15.0 = 4.34; AC (B)/AC (A) = 1.20 CR (A) = 34.8 × 10−6 ; CR (B) = 56.9 × 10−6 ; CR (B)/CR(A) = 1.64 CN (A) = 426 × 10−6 ; CN (B) = 556 × 10−6 ; CN (B)/CN (A) = 1.30 CL (A) = 3.16; CL (B) = 4.89; CL (B)/CL (A) = 1.55 ACL (A) = 3.73; ACL (B) = 5.52; ACL (B)/ACL (A) = 1.48
AC (A) = 171/18.0 = 9.52; AC (B) = 171/18.0 = 9.52; AC (B)/AC (A) = 1 CR (C) = 64.7 × 10−6 ; CR (D) = 100 × 10−6 ; CR (D)/CR (C) = 1.55 CN (C) = 904 × 10−6 ; CN (D) = 1151 × 10−6 ; CN (D)/CN (C) = 1.27 CL (C) = 5.45; CL (D) = 8.46; CL (D)/CL (C) = 1.55 ACL (C) = 5.45/0.85 = 6.43; ACL (D) = 8.46/0.85 = 9.99; ACL (D)/ ACL (C) = 1.55
Bias ACI (Y); O (Y); CCI (N); W (N) ACI (N); O (Y); CCI (N); W (Y) ACI (N); O (Y); CCI (N); W (Y) ACI (N); O (N); CCI (N); W (Y) ACI (N); O (N); CCI (N); W (N)
Note: “Y” indicates the indicator includes bias; “N” indicates it does not; “ACI (N)” and “CCI(N)” mean the bias of absolute citation inflation and comparative citation inflation have been eliminated; “W(N)” means considering citation time window, and “O(N)” means considering obsolescence–citation lifecycle.
on econometric estimation to identify and differentiate the influences over citation (Hall et al., 2000; Jaffe & Lerner, 2001). Inspired by these approaches, we developed some additional relative indicators, as shown in Eqs. (11)–(14) in Table 2. To better interpret and compare these indicators, we introduce four patents as examples: Patent A (No. 4979787 – granted in 1990), Patent B (No. 4529663 – granted in 1985), Patent C (No. 4926867 – granted in 1990) and Patent D (No. 4959774 – granted in 1990). As of the end of 2015, Patents A and B had been cited 65 times. Traditionally, the two patents would be regarded as having the same impact. However, when taking citation inflation into account, we see a different picture. Based on the percentile rank approach (the percentage of counts in its frequency distribution that is lower than it), Patent B has a higher impact than Patent A because Patent B sits within the top 2.7 % of patents according to its citation share. Whereas, Patent A sits in the top 4.4 % and would have had to receive at least 86 citations to equal Patent B. Table 3 illustrates this example using the one alternative approaches proposed by Jaffe et al. and the four we devised to measure and evaluate citation counts. All five indicate the relative value of Patent B’s citations to be higher than Patent A’s, even though they have received the same number of citations. Note that this example is a case of measure comparative inflation, where any of the above equations can be used to reveal its impacts from different perspectives. However, only Eqs. (11)–(14) in Table 2 can be used to eliminate the effects of absolute citation inflation because these approaches consider annual citation growth. Patents C and D were both granted in 1990 and both have been cited 171 times. When measuring two patents that were both granted in the same year with the same number of citations, Eq. (10) would show each as having equal impact. However, Eqs. (11)–(14) would show that Patent D has actually had a greater impact than Patent C because Patent C received most of its citations during a period of serious reference inflation, shown as Fig. 9. The above example indicates that Jaffe et al.’s fixed-effects approach in Eq. (10) fails to measure absolute citation inflation because it only considers the total citation count divided by the mean, and ignores the years in which the citations were received. Eq. (11) is a modified version of Eq. (10) that accounts for both absolute and comparative citation inflation by replacing total citations with an annual count. Eq. (12) is the simplest approach because it only accounts for the main cause of inflation – growth in the number of patents. Eqs. (13) and (14) are the most complicated but most effective approaches because they consider three informative parameters – the annual citation count, the annual mean, and the average reference share.
10
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
Fig. 9. The probability densities of the citation counts during 1990–2015 received by Patents C, D, and the patents granted in 1990. Table 4 Relative indicators of the citations received by patents for eight universities (1976–2015). No. pat. UC MIT Stanford Caltech Hopkins UWisc UMich Columbia
(1)
9,340 4,809 (2) 2,822 (4) 2,857 (3) 1,907 (6) 2,535 (5) 1,826 (7) 1,411 (8)
TC
AC (1)
151,129 123,620 (2) 52,228 (3) 51,145 (4) 28,906 (5) 27,785 (6) 25,310 (7) 22,389 (8)
TCR (×10−3 )
TAC (4)
16.18 25.71 (1) 18.51 (2) 17.90 (3) 15.16 (6) 10.96 (8) 13.86 (7) 15.87 (5)
(1)
11,114 9641 (2) 4592 (3) 4073 (4) 2110 (6) 2128 (5) 2046 (7) 1998 (8)
(1)
51.70 48.85 (2) 21.26 (3) 19.57 (4) 10.30 (6) 11.35 (5) 8.27 (7) 6.91 (8)
TCL
TCN (1)
0.770 0.666 (2) 0.285 (3) 0.273 (4) 0.147 (6) 0.152 (5) 0.125 (7) 0.108 (8)
TACL (1)
5870 5114 (2) 2304 (3) 2228 (4) 1072 (6) 1218 (5) 1000 (7) 870 (8)
9159 (1) 7917 (2) 3730 (3) 3477 (4) 1586 (7) 1880 (5) 1662 (6) 1531 (8)
Note: “No. pat.” – the number of patents; “TC” – the total citation count; “AC” – average citation count; the others represent the sum of the values calculated with Eqs. (10)–(14) respectively. The number next to the value represents the ranking order.
Of the five relative citation indicators used for patent evaluation, Eqs. (13) and (14) are recommended for calculating the relative value of citations because these two equations consider absolute inflation, comparative inflation, and patent obsolescence. Eq. (14) further considers citation time windows and eliminates the bias associated with the time – i.e., that older patents had more time to receive more citations. Notably, the time window cannot eliminate citation inflation: a limited time window means that citations were received nearly in the close year, and the impact of the citation inflation would be negligible; a long time window is beneficial to new citations rather than old citations, the latter would have suffered more inflation. The strengths and weaknesses of each of these alternatives are best illustrated through a case study. For this, we chose the patents held by eight well-known US universities, as presented in the next section. 5. Case study: Patent evaluations for eight US universities The eight universities we selected for the case study were the University of California (UC), the Massachusetts Institute of Technology (MIT), Leland Stanford Junior University (Stanford), the California Institute of Technology (Caltech), Johns Hopkins University (Hopkins), the University of Wisconsin (UWisc), the University of Michigan (UMich), and Columbia University in the City of New York (Columbia). All are prestigious universities founded before 1976 (given our corpus for analysis consists of USPTO utility patents granted between 1976 and 2015). Additionally, some of these universities boast similar total citation counts (e.g., Stanford vs. Caltech, Hopkins vs. UWisc), which makes it interesting to observe the differences between traditional indicators and the relative indicators introduced here. The citation impacts as assessed using the relative indicators are shown in Table 4. In a traditional evaluation system, the main indicator for assessing the impact of a patent or patent portfolio is simply the total citation count (TC) and/or the average citation count (AC). From Table 4, we can see that UC, MIT, Stanford, and Caltech are ranked as the top 4 according to both these traditional indicators as well all relative citation indicators. However, moving down to the 5th and 6th ranked universities, Hopkins received more citations both in total and on average than UWisc according to the traditional indicators, but, when measured in terms of the relative indicators, the reverse is true. Table 5 shows the reason. Hopkins has a larger proportion of patents that were granted more than 10 years ago, but these older patents received many new citations during 2006–2015 due to serious citation inflation in 2006 and 2010. Hence,
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
11
Table 5 The number and distribution of patents and citations during 1976–2005 and 2006–2015: UWisc vs. Hopkins. UWisc Patents
Old 1222 (48.21 %) New 1313 (51.79 %)
Hopkins Citations
Patents
Old
New
9,282 (33.41 %)
15,511 (55.82 %)
–
2,992 (10.77 %)
Old 1042 (54.64 %) New 865 (45.36 %)
Citations Old
New
7750 (26.81 %)
18,893 (65.36 %)
–
2,263 (7.83 %)
Note: The patents granted during 1976–2005 are treated as the old patents, and the patents granted during 2006–2015 are treated as the new patents; The citations received during 1976–2005 are treated as the old citations, and the citations received during 2006–2015 are treated as the new citations.
Fig. 10. The rate of change of relative citation indicators in all fields. Note: If the value is greater than 0, it means the citation impact is underestimated, and if the value is lower than 0, it means the citation impact is overestimated.
when taking citation inflation, patent age, and obsolescence into account, the results demonstrate that UWisc’s patents have had a more real impact. Further, compared to the absolute citation indicator – total citation count (TC), the values for the relative citation indicators in Fig. 10 show a different story. The impact of both UC’s and Hopkins’ citations decreased, suggesting that the citation impact of these universities has been overestimated. While they have benefited from citation inflation by receiving many more citations than usual, they have not received as many as other universities in relative terms. Stanford’s and Caltech’s citation impact increased across every indicator, which reflects an underestimation of citation impacts, i.e., citation inflation has had less of an effect on their citations than for others. Overall, the relative citation indicators have more potential to describe the complete picture of targeted patents’ citation impacts. 6. Conclusions and discussions According to the theory of obsolescence, academic papers typically receive most of their citations in their first few years of life, after which they gradually receive fewer and fewer per year (Brookes, 1970; Burton & Kebler, 1960; de Solla Price, 1965; Gosnell, 1944; Griffith et al., 1979; Marton, 1985). This phenomenon of obsolescence also exists with patents: patent citations rapidly increase in the years immediately following a patent’s approval, peaking several years later before gradually decreasing into obsolescence. However, when we quantify citations with traditional indicators, we see the pace of obsolescence in patents has slowed down and even stopped (see Fig. 11a). Yet when we use a relative indicator that accounts for citation inflation, the trends coalesce into patterns that more closely follow the traditional theory of obsolescence – i.e., they peak within a few years of issue, then gradually taper (see Fig. 11b – Eq. (11)). The stark comparison between the figure panels (a) and (b) in Fig. 11 shows how relative indicators can remove the bias introduced by citation inflation. In fact, almost every traditional approach to measuring citations is subject to some form of bias, such as citation intensity, co-citation strength, etc. For example, co-citation strength is a measure of how many times the same two documents are cited by another document (Small, 1973); citation inflation will artificially increase the value of this strength. Therefore, we assert that these traditional measures should be used with greater caution and that a relative indicator would result in a more accurate analysis. Taken as a whole, the results of this study verify that patent citation inflation has occurred frequently over the past four decades – in some cases at very high levels. Citation inflation is an under-appreciated statistical bias that affects the
12
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
Fig. 11. Citations counts vs. relative indicators for patents granted in selected years.
quantitative evaluation, so citation deflators should be used in science and technology evaluation whenever citation tallies are the basis for objective assessment. This is the motivation for the alternative approaches using relative citation indicators presented in this paper. A case study on the patents held by eight universities demonstrates the new perspective relative indicators can bring to evaluating research performance. Compared to traditional indicators that pay more attention to quantity, relative indicators give us new insights into quality, salience, and impact. However, as with all studies, ours has several limitations that provide opportunities for further research. First, the methods we used to measure citation inflation are only able to provide a rough indication of citation inflation across time. Hence, they do not present a complete and precise picture of the landscape. Second, citation inflation is significant for USPTO patents because applicants are required by US patent law to cite all known prior art (Criscuolo & Verspagen, 2008; Li, Chambers, Ding, Zhang, & Meng, 2014); whereas, for patents granted by other countries or regions, citation inflation may not be as severe. Third, citation behavior is highly correlated to time and, even though we made some attempt to account for the time lag between publications and citations, more could be done to accurately account for this influence (Chen, Huang, Hsieh, & Lin, 2011). For example, the time lag between when a patent is lodged and when it is granted may not be constant and depends on procedures within the patent office, legal issues like “patentability of biotechnology inventions”, and other factors. Fourth, we used USPTO utility patents as a corpus to explore the phenomenon of citation inflation. However, we do not know which findings might hold for non-patent literature. This would be worthy of a comparative study between the patent and non-patent literature. In future research, we intend to improve our measurement methods with a more in-depth analysis of the causes and effects of citation inflation in different fields. Author contributions Ying Huang: Conceived and designed the analysis; Performed the analysis; Wrote the paper. Lixin Chen: Collected the data; Performed the analysis; Wrote the paper. Lin Zhang: Conceived and designed the analysis; Wrote the paper. Acknowledgments We would like to thank Professor Liming Liang (Henan Normal University) and the reviewers for their valuable comments and suggestions. This study was supported by the National Natural Science Foundation of China (Grant No. 71974150, 71573085 and No 71373252), the Excellent Scholars of Philosophy and Social Sciences in Henan Province (Grant No. 2018YXXZ-10) and the Fundamental Research Funds for the Central Universities. Appendix A. The parameters in Eqs. (9)–(13) for calculating relative citation indicators The five parameters required for Eqs. (9)–(13) are: the average citation count AC; the annual average citation count ACi (AAC); the annual reference count Ri ; the annual number of patents Ni ; and the average reference share Sa . Here, AC, Sa , and Ri are shown in Tables A2–A4. The remaining parameter ACi is a ratio of the annual citation count (see Table A1) compared to the annual number of patents (see Table A2).
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
13
Table A1 Annual citation counts (1976–2015). Citations received in that year
Granted year
1976 1977 1978 ... 2013 2014 2015
1976
1977
1978
...
2013
2014
2015
869
13,705 734
26,756 13,661 548
... ... ... ...
27,962 27,442 27,990 ... 12,322
28,566 28,459 29,102 ... 94,752 12,888
25,596 25,497 25,751 ... 166,991 101,297 11,449
Note: The complete table can be obtained by contacting the corresponding author.
Table A2 Mean value – the average citation count (AC). Year
Patents
Citations
AC
Year
Patents
Citations
AC
1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995
70,194 65,215 66,087 48,841 61,815 65,770 57,877 56,863 67,212 71,668 70,867 82,960 77,938 95,565 90,421 96,561 97,472 98,385 101,695 101,431
808,420 783,467 802,079 602,030 780,683 859,782 791,844 804,717 980,072 1074,075 1125,500 1423,032 1354,772 1692,464 1623,849 1763,363 1871,231 1978,713 2181,028 2220,132
11.52 12.01 12.14 12.33 12.63 13.07 13.68 14.15 14.58 14.99 15.88 17.15 17.38 17.71 17.96 18.26 19.20 20.11 21.45 21.89
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
109,654 112,019 147,577 153,591 157,596 166,158 167,400 169,077 164,384 143,891 173,822 157,331 157,788 167,463 219,848 224,871 253,633 278,518 301,643 299,382
2,487,824 2,4966,65 3,351,841 3,324,762 3,284,948 3,192,847 2,723,405 2,557,507 2,139,366 1,610,255 1,703,710 1,260,209 989,863 843,766 857,232 617,743 457,356 274,065 114,185 114,49
22.69 22.29 22.71 21.65 20.84 19.22 16.27 15.13 13.01 11.19 9.80 8.01 6.27 5.04 3.90 2.75 1.80 0.98 0.38 0.04
Table A3 The average reference share Sa – the average share of patents aged a being cited as references. Age
Share
Age
Share
Age
Share
Age
Share
0 1 2 3 4 5 6 7 8 9 10
0.0023 0.0293 0.0566 0.0645 0.0655 0.0629 0.0586 0.0542 0.0498 0.0458 0.0419
11 12 13 14 15 16 17 18 19 20 21
0.0386 0.0350 0.0318 0.0288 0.0260 0.0234 0.0211 0.0190 0.0172 0.0157 0.0142
22 23 24 25 26 27 28 29 30 31 32
0.0129 0.0118 0.0108 0.0099 0.0090 0.0083 0.0075 0.0068 0.0062 0.0057 0.0053
33 34 35 36 37 38 39 40 41 42 43
0.0049 0.0046 0.0043 0.0040 0.0038 0.0036 0.0034 0.0032 0.0030 0.0029 0.0027
Table A4 The annual references provided (1976–2015). Year
References
Year
References
Year
References
Year
References
1976 1977 1978 1979 1980 1981 1982 1983 1984 1985
345,152 330,307 338,630 256,914 349,474 388,785 350,432 349,110 423,852 470,447
1986 1987 1988 1989 1990 1991 1992 1993 1994 1995
472,487 576,221 560,121 721,407 683,822 739,987 781,187 818,774 905,245 956,839
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
1,084,892 1,158,179 1,557,071 1,660,632 1,799,693 1,984,975 1,988,917 2,172,662 2,193,428 2,135,018
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
2,825,709 2,608,498 2,638,822 3,016,660 4,255,724 4,563,705 4,994,636 5,610,371 6,089,071 5,836,037
14
Y. Huang, L. Chen and L. Zhang / Journal of Informetrics 14 (2020) 101015
References Abel, A. B., & Bernanke, B. S. (2005). Macroeconomics (5th edition). Boston: Pearson/Addison Wesley. Albert, M. B., Avery, D., Narin, F., & McAllister, P. R. (1991). Direct validation of citation counts as indicators of industrially important patents. Research Policy, 20(3), 251–259. Barro, R. J. (1997). Macroeconomics. Cambridge, Massachusetts: MIT Press. Brookes, B. C. (1970). The growth, utility, and obsolescence of scientific periodical literature. Journal of Documentation, 26(4), 283–294. Burton, R. E., & Kebler, R. W. (1960). The “half-life” of some scientific and technical literatures. American Documentation, 11(1), 18–22. Chen, C., & Hicks, D. (2004). Tracing knowledge diffusion. Scientometrics, 59(2), 199–211. Chen, D.-Z., Huang, M.-H., Hsieh, H.-C., & Lin, C.-P. (2011). Identifying missing relevant patent citation links by using bibliographic coupling in LED illuminating technology. Journal of Informetrics, 5(3), 400–412. Chorus, C., & Waltman, L. (2016). A large-scale analysis of impact factor biased journal self-citations. PloS One, 11(8), e0161021. Cotropia, C. A. (2009). Modernizing patent law’s inequitable conduct doctrine. Berkeley Technology Law Journal, 24(2), 723–783. Cotropia, C. A., Lemley, M. A., & Sampat, B. (2013). Do applicant patent citations matter? Research Policy, 42(4), 844–854. Criscuolo, P., & Verspagen, B. (2008). Does it matter where patent citations come from? Inventor vs. examiner citations in European patents. Research Policy, 37(10), 1892–1908. de Solla Price, D. J. (1961). Science since babylon. New Haven: Yale University Press. de Solla Price, D. J. (1963). Little science, big science. New York: Columbia University Press. de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515. Epicoco, M. (2013). Knowledge patterns and sources of leadership: Mapping the semiconductor miniaturization trajectory. Research Policy, 42(1), 180–195. Érdi, P., Makovi, K., Somogyvári, Z., Strandburg, K., Tobochnik, J., Volf, P., et al. (2013). Prediction of emerging technologies based on analysis of the US patent citation network. Scientometrics, 95(1), 225–242. Gosnell, C. F. (1944). Obsolescence of books in college libraries. College and Research Libraries, 5(2), 115–125. Griffith, B. C., Servi, P. N., Anker, A. L., & Drott, M. C. (1979). The aging of scientific literature: A citation analysis. Journal of Documentation, 35(3), 179–196. Hall, B. H., Jaffe, A. B., & Trajtenberg, M. (2000). Market value and patent citations: A first look. National Bureau of Economic Research Working paper series, No. 7741. Hall, B. H., Jaffe, A. B., & Trajtenberg, M. (2001). The NBER patent citation data file: Lessons, insights and methodological tools. National Bureau of Economic Research Working paper series, No. 8498. Henderson, R. M., Jaffe, A. B., & Trajtenberg, M. (1998). Universities as a source of commercial technology: A detailed analysis of university patenting, 1965–1988. The Review of Economics and Statistics, 80(1), 119–127. Heneberg, P. (2014). Parallel worlds of citable documents and others: Inflated commissioned opinion articles enhance scientometric indicators. Journal of the Association for Information Science and Technology, 65(3), 635–643. Heneberg, P. (2016). From excessive journal self-cites to citation stacking: Analysis of journal self-citation kinetics in search for journals, which boost their scientometric indicators. PloS One, 11(4), e0153730. Hu, A. G. Z., & Jaffe, A. B. (2003). Patent citations and international knowledge flow: The cases of Korea and Taiwan. International Journal of Industrial Organization, 21(6), 849–880. Huang, Y., Zhu, D., Qian, Y., Zhang, Y., Porter, A. L., Liu, Y., et al. (2017). A hybrid method to trace technology evolution pathways: A case study of 3D printing. Scientometrics, 111(1), 185–204. Ioannidis, J. P. A. (2015). A generalized view of self-citation: Direct, co-author, collaborative, and coercive induced self-citation. Journal of Psychosomatic Research, 78(1), 7–11. Jaffe, A. B., & Lerner, J. (2001). Reinventing public R&D: Patent policy and the commercialization of national laboratory technologies. The Rand Journal of Economics, 32(1), 167–198. Jaffe, A. B., & Trajtenberg, M. (1999). International knowledge flows: Evidence from patent citations. Economics of Innovation and New Technology, 8(1–2), 105–136. Jaffe, A. B., Trajtenberg, M., & Fogarty, M. S. (2000). Knowledge spillovers and patent citations: Evidence from a survey of inventors. The American Economic Review, 90(2), 215–218. Lanjouw, J. O., & Schankeman, M. (2004). Patent quality and research productivity: Measuring innovation with multiple indicators. The Economic Journal, 114(495), 441–465. Larsen, P. O., & von Ins, M. (2010). The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics, 84(3), 575–603. Li, R., Chambers, T., Ding, Y., Zhang, G., & Meng, L. (2014). Patent citation analysis: Calculating science linkage based on citing motivation. Journal of the Association for Information Science and Technology, 65(5), 1007–1017. Martinelli, A. (2012). An emerging paradigm or just another trajectory? Understanding the nature of technological changes using engineering heuristics in the telecommunications switching industry. Research Policy, 41(2), 414–429. Marton, J. (1985). Obsolescence or immediacy? Evidence supporting Price’s hypothesis. Scientometrics, 7(3), 145–153. Nelson, A. J. (2009). Measuring knowledge spillovers: What patents, licenses and publications reveal about innovation diffusion. Research Policy, 38(6), 994–1005. Opthof, T. (2013). Inflation of impact factors by journal self-citation in cardiovascular science. Netherlands Heart Journal, 21(4), 163–165. Park, H.-W., & Suh, S.-H. (2013). Scientific and technological knowledge flow and technological innovation: Quantitative approach using patent citation. Asian Journal of Technology Innovation, 21(1), 153–169. Persson, O., Glänzel, W., & Danell, R. (2004). Inflationary bibliometric values: The role of scientific collaboration and the need for relative indicators in evaluative studies. Scientometrics, 60(3), 421–432. Petersen, A. M., Pan, R. K., Pammolli, F., & Fortunato, S. (2019). Methods to account for citation inflation in research evaluation. Research Policy, 48(7), 1855–1865. Sampat, B. N. (2010). When do applicants search for prior art? The Journal of Law & Economics, 53(2), 399–416. Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269. Verspagen, B. (2000). The role of large multinationals in the Dutch technology infrastructure. A patent citation analysis. Scientometrics, 47(2), 427–448. von Wartburg, I., Teichert, T., & Rost, K. (2005). Inventive progress measured by multi-stage patent citation analysis. Research Policy, 34(10), 1591–1607. Wang, X., Zhang, X., & Xu, S. (2011). Patent co-citation networks of Fortune 500 companies. Scientometrics, 88(3), 761–770. Weng, C., & Daim, T. U. (2012). Structural differentiation and its implications—Core/periphery structure of the technological network. Journal of the Knowledge Economy, 3(4), 327–342.