Availableonline onlineatatwww.sciencedirect.com www.sciencedirect.com Available Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science 162 (2019) 9–14 Procedia Computer Science 00 (2019) 000–000 Procedia Computer Science 00 (2019) 000–000
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
7th International Conference on Information Technology and Quantitative Management 7th International Conference on Information Technology and Quantitative Management
(ITQM 2019) (ITQM 2019)
Measuring Measuring corruption corruption using using the the Internet Internet data: data: Example Example from from countries along the Belt and Road countries along the Belt and Road Lili Panaa, Qianqian Fengb,c , Jianping Lib,c, Xiaoqian Zhuc, Lin Wangc * Lili Pan , Qianqian Fengb,c, Jianping Lib,c, Xiaoqian Zhuc, Lin Wangc * a
School of Economics and Management,University of Chinese Academy of Sciences, Beijing 100190, China Economics School ofofPublic Policyand andManagement,University Management, UniversityofofChinese ChineseAcademy AcademyofofSciences, Sciences,Beijing Beijing100190, 100049,China China b c Public Policy and Management, University of Chinese Academy of Sciences, Beijing 100049, China School of Institutes of Science and Development, Chinese Academy of Sciences , Beijing 100190, China c Institutes of Science and Development, Chinese Academy of Sciences , Beijing 100190, China b a School
Abstract Abstract How to scientifically measure corruption has become a key issue for both the anti-corruption research and practice. By How to scientifically corruption has become a topics key issue for bothTrends, the anti-corruption research aand practice. By introducing the searchmeasure popularity of corruption-related in Google this paper proposes comprehensive introducing the search popularity of corruption-related topics in Google Trends, this paper proposes a comprehensive corruption index called Corruption Search Index (CSI). A novel search strategy is designed, aiming to adjust the search corruption index Corruptionamong Searchdifferent Index (CSI). A novel search strategy aiming to the adjust popularity data forcalled the comparison countries. The empirical analysisisofdesigned, 63 countries along Beltthe andsearch Road popularity for the comparison countries. Thecountries empiricalwhile analysis of 63 countries along the Beltcorruption and Road shows thatdata Singapore remains the among cleanestdifferent status among these Afghanistan is facing a serious shows that Singapore remains the cleanest status among these countries while Afghanistan is facing a serious corruption situation. situation. © © 2020 2019 The The Authors. Authors. Published Published by by Elsevier Elsevier B.V. B.V. This is an open accessPublished article under the CC BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/4.0/) © 2019 The Authors. by B.V. of the license Selection and/or peer-review under Elsevier responsibility organizers of ITQM 2019 Peer-review underpeer-review responsibility of the scientific committee of the 7thofInternational Selection and/or under responsibility of the organizers ITQM 2019Conference on Information Technology and Quantitative Management (ITQM 2019) Keywords: Corruption measurement, Google Trends, Perceived corruption, Corruption Search Index Keywords: Corruption measurement, Google Trends, Perceived corruption, Corruption Search Index
1. Introduction 1. Introduction The accurate measurement of corruption has long been an important issue for both the anti-corruption research accurate of corruption hastwo longmethods been anare important issueapplied: for bothsubjective the anti-corruption research andThe practice [1].measurement Among the current literature, commonly investigations and and practice [1]. Among the current literature, two methods are commonly applied: subjective investigations and statistical analysis of revealed corruption cases [2]. For example, the widely accepted Corruption Perceptions statistical analysis of revealed corruption cases [2]. For example, the widely accepted Corruption Perceptions Index (CPI) of Transparency International and Control of Corruption (CC) from World Bank both interview Index of Transparency International and Control (CC) from[3]. World Bank both experts(CPI) and companies to evaluate the corruption levelsofofCorruption different countries However, they interview are often experts and companies to evaluate the corruption levels of different countries [3]. However, often challenged in terms of the explanatory power due to the subjective biases of the respondents [4-5]. they As toare the latter challenged in terms of the explanatory power due to the subjective biases of the respondents [4-5]. As to the latter method, some scholars take the number of reported crimes as the indicator of corruption [6-7]. As an import method, some scholars risk,corruption take the number can’t of reported crimeswith as the of corruption Asfields an import component of political be evaluated riskindicator measurement methods[6-7]. in other such component of political risk,corruption can’t be evaluated with risk measurement methods in other fields such
* Corresponding author. Tel.: 15829475816. * Corresponding author. Tel.: 15829475816. E-mail address:
[email protected]. E-mail address:
[email protected].
1877-0509 © 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the scientific committee of the 7th International Conference on Information Technology and Quantitative Management (ITQM 2019) 10.1016/j.procs.2019.11.250
Lili Pan et al. / Procedia Computer Science 162 (2019) 9–14 Author name / Procedia Computer Science 00 (2019) 000–000
10
as finance and economics [8]. Yet it is questionable considering the corruption’s nature of vagueness and concealment, i.e. as many crimes are ‘under the table’, the anti-corruption results cannot reflect the real status of corruption. More information should be generated to comprehensively draw the map of corruption [2]. With the rapid development of the Internet, abundant data have been created as people sharing, spreading and obtaining information. These data range from government documents to blogs and academic papers, which provide various and unique evidence for understanding certainty phenomenon [9-10]. The application of the internet data involves identifying risk factors or default signals from financial statement databases [11-15] or online reviews, investigating the relationship between policy uncertainty and oil price using the news report data [16-18], etc. Among the research on the corruption, Albert et al. argue that ceteris paribus, the more often the corruption occurs, the more relevant information will be scattered on the Internet, as more people may create corruption document such as blogs or twitters on the Internet in an anonymous and covert manner [19]. Besides, people tend to search for the relevant information like the corruption laws and reports on the Internet before starting to corrupt [20]. Goel et al. suggest that the popularity of corruption in search engines can reflect the perception of corruption in a region [21]. Given the breadth of internet content coverage, internet searches can reveal more corruption cases than government documents or news media alone [9]. Under the above circumstances, this paper introduces the Internet corruption search data to supplement the corruption index with relatively objective information. A comprehensive corruption index will be constructed in Section 2, and Section 3 will present the empirical results using the data from countries along the Belt and Road, with the conclusions in Section 5. 2. Construction of the Comprehensive Corruption Index (CCI) As Google.com is one of the world's most popular search engines, with a market share of 59% (for mobile and tablet devices, this proportion is 90.8%) [22], we introduce the search popularity of corruption and related topics in Google Trends as a main indicator. In Google Trends, the search frequency of one topic is divided by the total search frequency in the specific geographic and time range. Setting the maximum ratio as 100, a number between 0 and 100 will be obtained to demonstrate the relative search popularity. The ratio will be automatically adjusted according to the real-time searching activities. For example, when searching for “corruption” around the worldwide from January 1, 2012 to December 31, 2017. The Google Trends value (presented in Figure 1) will be a time series that demonstrates the relative search popularity at each time point. Besides, the “Interest by region” module displays the comparative search popularity in different areas. As shown in Figure 2, the country with darkest blue has the highest search popularity for “corruption”.
(a)
(b)
Fig. 1. (a) Interest over time in Google Trends; (b) Interest by region in Google Trends
Lili Pan et al. / Procedia Computer Science 162 (2019) 9–14 Author name / Procedia Computer Science 00 (2019) 000–000
11
To ensure the values of Google Trends are comparable in both time and region dimensions, a certain search strategy is designed as follows (presented in Figure 3): Step 1: Selecting corruption-related topics. Topics in Google Trends are a group of terms that share the same concept in any language. And “Related topics” module can report the related search topic to corruption, with which "Bribery", "Corruption" and "Political Corruption" are selected as the corruption-related topics in Google Trends. Step 2: Getting nationally comparable data. For each search topic, the search scope is selected as "worldwide" and time span as one year, and the relative search popularity of each topic in each country can be obtained by exporting the data in the “Interest by region” module. Note that search scope is set as “worldwide”, the google trends value in each country becomes comparable. Step 3: Getting yearly comparable data. In order to make data comparable among different years, the time span is modified to 2012.1.1-2017 12.31. Taking 2012 as base year, the yearly adjustment factor is determined by the ratio of the search frequency of each year to the base year. Then the yearly google trends value in each country is adjusted by multiplying the adjustment factor. Step4: Integrating results. The Corruption Search Index (CSI) can be obtained by adding up the adjusted google trends of three topics.
Fig. 2. Search strategy in Google Trends
CC and CPI are the two most widely used and accepted corruption perception indexes in the research on corruption [4-5]. Transparency International and World Bank respectively publish the corruption index for major countries and regions every year. The CPI score is on a scale of 0-100, where 0 means that a country is perceived as highly corrupted and 100 means that a country remains honest. CC score is represented by a standard normal distribution ranging from approximately -2.5 to 2.5, and the higher score stands for the cleaner country. In this paper, CSI distributes between 0-100, where 0 represents the cleanest status and 100 corresponds to the highest corruption level of a country. In order to obtain a Comprehensive Corruption Index (CCI), CPI, CC and CSI values are normalized according to Equation 1:
x' =
x-µ
s
(1)
Author name / Procedia Computer Science 00 (2019) 000–000 Lili Pan et al. / Procedia Computer Science 162 (2019) 9–14
12
Where x is the original value of CPI, CCI and CSI, µ and s are mean and standard deviation of all observations for each index. Concerning the nature of these indexes, we calculate CCI as Equation 2: CCI=CPI+CC-CSI
(2)
3. Empirical analysis Transparency International modified the CPI calculation rule in 2012, which makes the CPI score for the year before 2012 incomparable with the subsequent years. In empirical analysis, this study use data covering the period 2012-2017 for 63 countries along the Belt and Road (B&R), Table 1 shows the six-year average CCI and CPI scores for B&R countries, where Rank1 shows the ranking of CCI scores from low to high and Rank2 shows the ranking of CPI scores. It demonstrates that the CCI has a high correlation with CPI (Pearson correlation coefficient reached 0.972 and Kendall correlation coefficient reached 0.859). However, if we calculate the correlation coefficient between CCI and CPI per year, Pearson and Kendall correlation coefficients decline to 0.877 and 0.749. Table 1 Corruption measurement for countries along the Belt and Road
country
CCI
rank1
CPI
rank2
country
CCI
rank1
CPI
rank2
Singapore United Arab Emirates Estonia
4.79
1
85.00
1
Thailand
0.06
29
36.67
33
3.79
2
69.00
2
Philippines
-0.68
38
35.33
34
2.97
5
68.67
3
Armenia
-0.09
33
35.00
35
Qatar
3.45
3
66.67
4
Albania
-0.85
41
35.00
36
Bhutan
2.78
6
64.67
5
Maldives
-0.36
34
34.83
37
Israel
2.76
7
61.33
6
Indonesia
-1.38
48
34.67
38
Poland
3.01
4
60.50
7
Belarus
-0.44
35
34.50
39
Slovenia
2.48
9
59.67
8
Egypt
-0.76
39
33.83
40
Brunei
1.35
14
58.83
9
Moldova
-1.00
43
33.33
41
Lithuania
2.65
8
58.00
10
Timor-Leste
-0.57
36
32.00
42
Latvia
1.52
13
54.50
11
Vietnam
-0.59
37
32.00
43
Georgia
1.95
10
53.00
12
Pakistan
-1.30
47
29.67
44
Czech Republic
1.32
15
52.67
13
Nepal
-0.99
42
29.00
45
Hungary
1.76
11
51.17
14
Azerbaijan
-1.70
51
29.00
46
Malaysia
0.52
23
49.50
15
Kazakhstan
-1.88
53
28.50
47
Slovakia
1.20
18
49.17
16
Russia
-2.16
54
28.33
48
Croatia
1.74
12
48.50
17
Lebanon
-1.70
50
28.17
49
Jordan
1.20
17
48.50
18
Iran
-1.03
45
27.67
50
Saudi Arabia
1.25
16
47.67
19
Ukraine
-2.50
58
27.17
51
Bahrain
0.89
20
46.33
20
Kyrgyzstan
-2.58
60
26.67
52
Oman
0.79
22
45.50
21
Bangladesh
-1.02
44
26.17
53
Romania
0.15
28
45.33
22
Laos
-1.14
46
26.00
54
Turkey
0.88
21
44.50
23
Tajikistan
-1.80
52
23.17
55
Lili Pan et al. / Procedia Computer Science 162 (2019) 9–14 Author name / Procedia Computer Science 00 (2019) 000–000
13
Note: Palestine is not included in the table due to the data missing in Google Trends. Table 1 Corruption measurement for countries along the Belt and Road (continued)
country
CCI
rank1
CPI
rank2
country
CCI
rank1
CPI
rank2
Montenegro
0.97
19
43.67
24
Myanmar
-1.57
49
22.83
56
Kuwait
0.34
26
43.33
25
Cambodia
-2.73
62
21.00
57
Bulgaria
0.04
30
41.67
26
Uzbekistan
-2.55
59
19.00
58
Serbia Bosnia and Herzegovina Macedonia
0.43
25
40.83
27
Turkmenistan
-2.32
55
18.33
59
-0.76
40
39.67
28
Syria
-2.44
56
18.00
60
0.50
24
39.17
29
Yemen
-2.63
61
18.00
61
India
-0.05
32
38.00
30
Iraq
-2.47
57
16.83
62
Sri Lanka
0.27
27
37.67
31
Afghanistan
-3.79
63
11.50
63
Mongolia
0.01
31
37.67
32
Note: Palestine is not included in the table due to the data missing in Google Trends.
As the histogram shown (see Figure 4), most countries get the CCI score between (-3) and (+4). Only a few countries are out of the range: Singapore's CCI is one point higher than the UAE, which ranks the second in table 1. Besides, the CPI score of Singapore has ranked the first among B&R countries for continuous years, indicating that the country remains very clean. Afghanistan’s CCI score is one point lower than Cambodia, which is the second-lowest country in our database. In the CPI rankings, Afghanistan ranked the last for four years, indicating a serious corruption situation.
Fig. 3. Histogram of CCI distribution
4. Conclusion Considering the shortcomings of the current corruption indexes, this paper constructs a comprehensive corruption index by introducing the Internet data. The widely accepted perceived corruption and the public behaviors in search engines are combined, which is a meaningful attempt to explore novel methods for corruption measurement. This index suits for different countries, and more endeavors can be made to include other elevated data from papers, surveys, websites and other open data sources [23] in the future work. Moreover, though equal weighting is a straightforward way for combing the three indexes, inherent relationship among indexes and more diverse compounding methods [24-25] would be further explored in future research.
Lili Pan et al. / Procedia Computer Science 162 (2019) 9–14 Author name / Procedia Computer Science 00 (2019) 000–000
14
References [1] Heywood P. Political corruption: Problems and perspectives. Political Studies 1997; 45(3): 417-435. [2] Svensson J. Eight questions about corruption. Journal of Economic Perspectives 2005; 19(3): 19-42. [3] Ko K, Samajdar A. Evaluation of international corruption indexes: Should we believe them or not? The Social Science Journal 2010; 47(3): 508-540. [4] Donchev D D, Ujhelyi G. What do corruption indices measure? Economics & Politics 2014; 26(2): 309–331. [5] Chabova K. Measuring corruption in Europe: Public opinion surveys and composite indices. Quality & Quantity 2016; 51(4): 1877-1900. [6] Lisciandra M, Millemaci E. The economic effect of corruption in Italy: A regional panel analysis. Regional Studies 2016; 51(9): 13871398. [7] Dong B, Torgler B. Causes of corruption: Evidence from China. China Economic Review 2013; 26: 152-169. [8] Zhu X Q, Wang Y H, Li J P. Operational risk measurement: A loss distribution approach with segmented dependence. Journal of Operational Risk 2019; 14(1): 25-44. [9] Andersen T B, Bentzen J, Dalgaard C J L, et al. On the impact of digital technologies on corruption: Evidence from U.S. States and across countries. Discussion Papers 2008; 8-11. [10] Xiang C, Lu J. Do local investors have information advantages? An empirical study with Baidu search. Chinese Journal of Management Science 2019; 4(27): 25-36. (in Chinese) [11] Wei L, Li G, Zhu X, et al. Developing a hierarchical system for energy corporate risk factors based on textual risk disclosures. Energy Economics 2019; 80: 452-460. [12] Chen L, Xie Y, Li P, et al. The signal of default risk from the description-text based on the empirical research of p2p lending. Chinese Journal of Management Science 2019; 27(4): 37-47. (in Chinese) [13] Wei L, Li G, Zhu X, et al. Discovering bank risk factors from financial statements based on a new semi‐supervised text mining algorithm. Accounting & Finance 2019; 59(3): 1525-1558. [14] Wei L, Li G, Li J, et al. Bank risk aggregation with forward-looking textual risk disclosures. The North American Journal of Economics and Finance 2019; 50: 1-16. [15] Li J, Yao Y, Xu Y, et al. Consumer’s risk perception on the Belt and Road countries: Evidence from the cross-border e-commerce. Electronic Commerce Research 2019; doi: 10.1007/s10660-019-09342-x. [16] Chen X, Sun X, Li J. How does economic policy uncertainty react to oil price shocks? A multi-scale perspective. Applied Economics Letters 2019. doi: 10.1080/13504851.2019.1610704. [17] Sun X, Chen X, Wang J, et al. Multi-scale interactions between economic policy uncertainty and oil prices in time-frequency domains. The North American Journal of Economics and Finance 2018; doi: 10.1016/j.najef.2018.10.002. [18] Ji Q, Li J, Sun X. Measuring the interdependence between investor sentiment and crude oil returns: New evidence from the cftc's disaggregated reports. Finance Research Letters 2019; 30: 420-425. [19] Saiz A, Simonsohn U. Proxying for unobservable variables with internet document-frequency. Journal of the European Economic Association 2013; 11(1): 137-165. [20] Ryvkin D, Serra D, Tremewanc J. I paid a bribe: An experiment on information sharing and extortionary corruption. European Economic Review 2017; 94: 1-22. [21] Goel R K, Nelson M A, Naretta M A. The internet as an indicator of corruption awareness. European Journal of Political Economy 2012; 28(1): 64-75. [22] Bulut L. Google trends and the forecasting performance of exchange rate models. Journal of Forecasting 2018; 37(3): 303-315. [23] Zhu X Q, Wei L, Wu D S, et al. A general framework for constructing bank risk data sets. Journal of Risk 2018; 21(1): 37-59. [24] Li J, Yao X, Sun X, et al. Determining the fuzzy measures in multiple criteria decision aiding from the tolerance perspective. European Journal of Operational Research 2018; 264(2): 428-439. [25] Yao X, Li J, Sun X, et al. Insights into tolerability constraints in multi-criteria decision making: Description and modeling. KnowledgeBased Systems 2018; 162: 136-146.