Global performance on commercial search engines
5
This is the first chapter that discusses the results of our study. It is devoted exclusively to the cybermetric analysis of biotechnology companies, in relation to their global performance on the main horizontal sources, i.e., search engines and web metric information products. It should be clarified at this point that the overall purpose that we are pursuing, more than an analysis of the biotechnology sector in itself, which is a case study as valid as the analysis of any other sector, is to demonstrate the application of the analysis model described in Chapter 3, A cybermetric analysis model to measure private companies. Even when discussing the most meaningful performance results for the main web spaces of the 184 companies studied, the real, underlying intent of this particular chapter (and the entire block in general) is to illustrate the different categories and metrics that exist, and the statistical properties of web data. Unlike Chapter 3, A cybermetric analysis model to measure private companies (conceptual), this chapter will delve deeper into the sources and procedures for obtaining metrics and, especially, how to interpret them. This chapter is therefore broken down according to the main categories of the analysis model (size, mention, usage, and form). It will then describe the specific metrics that have been captured, the raw results and, finally, the most appropriate interpretation thereof. The approach is fully aligned with the branch of instrumental cybermetrics (Orduna-Malea & Aguillo, 2014), based on the study of data sources. It is a key area, and many studies have attempted to analyze the behavior of the main sources of data for their use in cybermetrics. Readers will find in the references section of this chapter some of the most important publications on the study of search engines from a cybermetrics perspective, works that have addressed issues of crucial importance, such as the coverage of content on search engines (Aguillo, Ortega, & Granadino, 2006; Thelwall, 2008b), their updating processes (Lewandowski, Wahlig, & Meyer-Bautor, 2006; Hellsten, Leydesdorff, & Wouters, 2006; Thelwall, 2001), their potential bias based on sociocultural and linguistic aspects (Van Couvering, 2010; Vaughan & Thelwall, 2004; Vaughan & Zhang, 2007), the stability/variability of their results over time (Bar-Ilan, 1999; 2002), the relevance and accuracy of their results (Pirkola, 2009; Uyar, 2009), their effectiveness when retrieving relevant hyperlinks (Jansen & Molina, 2006), the effect of different versions being retrieved by the same browser (Wilkinson & Thelwall, 2013), and different techniques and procedures for extracting data accurately (Bar-Ilan, 2001; Thelwall, 2008a; Thelwall & Sud, 2011). These publications, of empirical bent, were concentrated, in their large majority, during the 2000s, declining gradually during the following decade. Undoubtedly, the continuous changes made by search engines (disabling commands and basic Cybermetric Techniques to Evaluate Organizations Using Web-Based Data. DOI: http://dx.doi.org/10.1016/B978-0-08-101877-4.00005-3 Copyright © 2018 Enrique Orduna-Malea and Adolfo Alonso-Arroyo. Published by Elsevier Ltd. All rights reserved.
88
Cybermetric Techniques to Evaluate Organizations Using Web-Based Data
metric capabilities) and the development of semantic search engines (providing answers rather than documents in which they might be encountered) contributed to a certain abandonment of the study of search engines as data sources (Thelwall, 2010), relegating cybermetrics to relative obscurity. Nevertheless, it should be noted that the majority of these studies focused on data mining in academic settings, primarily institutions (universities, research centers), publications (electronic journals, thematic and institutional repositories, publishers), and authors (personal web spaces). Their purpose was, broadly speaking, to gather hyperlinks in order to harvest complementary data to substantiate the impact of academic web content and the explicit relationships between the various stakeholders in the production, dissemination, and consumption of scholarly material. Relationships that, in most cases, were not made clear in the literature, so bibliometric techniques were not able to detect them. The emergence of social media platforms (generally opaque to commercial search engines as far as cybermetrics is concerned), the development of academic search engines (Ortega, 2014) and platforms for academic profiles (Ortega, 2016), together with the technical (data mining) and conceptual (spam) limitations of hyperlinks, ended up dragging the scholarly community that had been working on academic cybermetrics to other regions of hyperspace. This planted the seed for the growth of the current altmetrics movement (Holmberg, 2016; Thelwall, 2014). However, the proven existence of limitations in the measuring instruments (search engines) does not necessarily mean we should put down tools and give up studying them, nor should we confine their potential uses to academia. The aim of this book is precisely to enable this discipline to branch out into the setting of private companies. The approach will be general, although we will look briefly at the academic activity of companies—a relatively small piece of the pie—in Chapter 7, Specific performance on specialized search engines. Search engines are sources of extremely valuable data for understanding the web performance of a company, even if we have to abandon the idea of accuracy and completeness for the sake of being able to obtain approximate indicators, which are no less important, nor should they be overlooked.
5.1
Outlook
Table 5.1 is a descriptive overview of the metrics selected to illustrate the web performance of biotech companies (the level of analysis is the web domain node), together with the sources used to extract them. The astute reader will have realized that we have not included metrics for all the categories of the analysis model (reach, opinion, topology, etc.). As we already mentioned in Chapter 4, General methodology, not all metric categories are activated at each level of analysis. Reach metrics, to give one example, are difficult to apply to a web domain node. If we encounter a web space in the form of a blog, the number of subscribers via content syndication (RSS or Atom) would constitute
Global performance on commercial search engines
89
Web performance on search engines: Metrics and sources used
Table 5.1
Category
Metric
Source
Size
Page count
Mention
Total external inlinks Total URL in-mentions from platform Total sites (inlinks) Citation flow Trust flow Total visits; average visit duration; pages/visit; bounce rate Bounce rate; daily pageviews/visit; daily time per visit Level A problems and Warnings Mobile Pagespeed performance Desktop Pagespeed performance SERP rank (Company name keyword)
Google Majestic Bing Baidu Yandex Majestic Twitter Majestic Majestic Majestic Similarweb
Usage
Formal
Alexa Tawdis PageSpeed Insights PageSpeed Insights Google
a valid example of the reach metric. However, these data are only available for web spaces that are blogs, hence the motive for not including the metric. Another significant absence is that of topological measurements; they depend on the elements that are considered to be a node and, in particular, on the type of link or relationship to be studied between these nodes. They are, therefore, metrics that will be obtained from other metrics that express degrees of relationship. An example of this would be mentions (the mention of one node by another establishes a conceptual and/or semantic relationship between the two). Topological measurements will be illustrated with examples of selective mention indicators in Chapter 6, Selective performance on commercial search engines. The metrics covered in this chapter are: Size (from different search engines), total mentions based on different units (links and websites) and weightings (combined indices such as Trust Flow), usage (visitors, bounce rate, duration of a session, etc.), and formal (relating to web usability, page speed, and search engine optimization, SEO). These are, perhaps, the most relevant of the many metrics that could apply. There are many sources and metrics, but we have only chosen those that we believe to be the most illustrative taking into account not only the length but also the purpose of this book: To propose an analysis model and apply it empirically. With readability in mind, we decided to divide the chapter into different sections for each of the different metrics categories. In each we describe the procedures
90
Cybermetric Techniques to Evaluate Organizations Using Web-Based Data
used to obtain the data, examine the final results, and then close with a discussion of the implications.
5.2
Size
5.2.1 Data gathering Size data were determined by the page count metric. Five different search engines were used: Two general commercial search engines (Google and Bing), two general search engines but targeting specific geographic communities, China (Baidu) and Russia (Yandex), and finally a search engine that specializes in hyperlinks (Majestic). In all cases, the procedure was based on calculating the number of indexed URLs and not on counting actual files. In the case of Google (,https://www.google.com.), Bing (,https://www.bing. com.), Baidu (,http://www.baidu.com.), and Yandex (,https://www.yandex. com.), the data were extracted manually through the use of the ,site . search command, with the following format: ,site:company.com.. Subsequently we captured the hit count estimates (HCE) returned by the search engine results page. In the case of Majestic, the data were obtained automatically from the API provided by the search engine (https://developer-support.majestic.com/api). The name of this metric on Majestic is “Indexed URLs”. The Majestic data were gathered on December 26, 2016, while the data from Google, Bing, Baidu, and Yandex were obtained on January 2, 2017. The aim was to make all the results as comparable as possible.
5.2.2 Results If we look at the data retrieved by Majestic, the largest web domain (i.e., with the most indexed URLs) was NanoString Technologies (,nanostring.com.), with 2,20,302 URLs. However, the number of URLs retrieved for the same domain are significantly lower in Google (9180), absurdly low in Bing (2300), and almost negligible in Yandex (727) and Baidu (57). So the coverage of search engines is woefully uneven. The Yandex and Baidu results are understandable in as much as they are search engines that focus on their respective regions, but the differences between Google, Bing, and Majestic are completely incomprehensible. Yet this issue does not stop here; it is not just a problem of coverage, which could affect all the analyzed web domains equally. The real problem is that Qiagen (,qiagen. com.) is the largest domain for Google, Baidu, and Yandex, while for Bing it is FibroGen Inc (,fibrogen.com.), with an absolutely sky-high value (more than two million URLs). We are therefore faced with a paradigmatic example of the effect of external variables: Size depends on the angle from which you look at it. And that is not the only such factor. How good, or bad, the capabilities of a search engine are will determine the depth of indexing, and the more that is indexed, the higher the percentage that will be found of the actual number of
Global performance on commercial search engines
91
URLs on a particular web domain. Other parameters also come into play, including the fulfilment of certain technical criteria on websites created under the analyzed domains: What is known as search engine compliance. What this means is that there is a possibility that the search engine does not locate more content (and the corresponding URLs) because the websites have been built without respecting certain standards or requirements in order that they may be indexed by a search engine, or it might simply be a case of malpractice. Therefore, the different coverage and the technical aspects of content creation have a parallel influence on estimating web size metrics, and may offer a distorted view of reality. Proof of this can be seen in the differences between the five search engines when estimating the size of the 184 biotech companies. Fig. 5.1 is a box-and-whisker plot showing not only the difference in average coverage but also the high variability of the data. Surprisingly, both Bing and Majestic provide—on average—higher results than Google. Another discernible effect, apart from the coverage of search engines and the technical characteristics of the content, is the rounding of the results. As noted earlier, search engines provide an estimated number of results (hence the name “hit count estimates”). This may affect the results to a greater or lesser extent, depending on the order of magnitude in which we find ourselves. In order to illustrate this effect we calculated the percentage of both odd and even results, and those ending in “0” (Table 5.2), since these numbers can be the result of rounding. Our results indicate how odd and even numbers of results are relatively balanced in Majestic and in Yandex and Baidu. However, the majority of the results are even numbers both in Google (79.3%) and Bing (94.0%). Moreover, the percentage of numbers ending in zero amounts to 89.1% in Bing.
Figure 5.1 Box-and-whisker plot: Page Count according to different search engines.
92
Table 5.2 Type
Odd Even Zero
Cybermetric Techniques to Evaluate Organizations Using Web-Based Data
Nature of size results in search engines Majestic
Google
Bing
Yandex
Baidu
N
%
N
%
N
%
N
%
N
%
90 94 16
48.9 51.1 8.7
38 146 117
20.7 79.3 63.6
11 173 164
6.0 94.0 89.1
92 92 22
50.0 50.0 12.0
96 88 21
52.2 47.8 11.4
Therefore, to better understand the results it is essential that we fully grasp the statistical nature of the web data. For each of the five search engines we performed various normality tests (Shapiro 2 Wilk and Anderson 2 Darling) to determine data distribution. All the tests indicated that nonnormality is significant (α , 0.01; P-value ,0.0001), i.e., that the data do not have a normal distribution. This fact, which seems to be common to web data (Barabasi & Albert, 1999), inevitably conditions the analysis and interpretation of most of the cybermetric metrics that we will be exploring throughout the remaining chapters of this block. It is therefore appropriate to apply nonparametric tests of independence to determine the statistical differences between the five size samples obtained (one for each search engine). For this, we applied the Kruskal 2 Wallis test (comparison of k-independent samples) and the Friedman test (comparison of k-paired samples), which demonstrated that the statistical difference between the five samples was significant (α , 0.01; P-value ,0.0001; DF: 4). There is now no doubt that each search engine has provided significantly different data. But does this mean that there is a total lack of correlation between the results for each search engine? Not necessarily. So far we have seen how the distribution of data in each search engine has a nonnormal distribution, and that the results provided by each search engine are statistically different; but there could still be some correlation between them despite the fact that they provide results in different orders of magnitude. To find out for sure, we carried out an analysis of similarity from the correlation coefficient obtained by two different methods: Pearson (Table 5.3; top) and Spearman (Table 5.3; bottom). The differences obtained between both methods are staggering. We only have to look at the correlation between Majestic and Google to appreciate this disparity: The correlation goes from virtually nonexistent (Rp 5 0.14) to highly positive (Rs 5 0.83). To discover the reason for this we will have to continue searching in the nonnormal distribution of the data. The Pearson correlation is not appropriate in nonnormal distributions. Web data generally have a potential distribution (power law), in which a few results have very high values while the vast majority have low values—the famous long tail (Anderson, 2006). One possible method for obtaining correlations is to log normalize the data before performing the test of similarity, or using the Spearman correlation on a complementary basis. In this case (Table 5.3, bottom) a weak but
Global performance on commercial search engines
93
Correlation matrix using Pearson (top) and Spearman (bottom)
Table 5.3
Majestic Google Bing Yandex Baidu
Majestic Google Bing Yandex Baidu a
Majestic
Google
Bing
Yandex
Baidu
1 0.14 0.00 a 0.34 0.13
1 0.02 a 0.74 a 0.98
1 20.01 0.01
1 a 0.77
1
Majestic
Google
Bing
Yandex
Baidu
1 a 0.83 a 0.54 a 0.59 a 0.30
1 a 0.50 a 0.64 a 0.32
a
1 0.43 a 0.42
1 a 0.34
1
Significant values at the level of significance α , 0.010 (two-tailed test).
significant correlation between the data may be observed, especially between Majestic and Google. Moreover, this finding points to an interesting fact, which is, quite simply, the high representativity of the size data from Majestic, which could thus be combined with other data from the same source in synthetic indicators (such as Web Impact Factor, an indicator created by dividing the number of mentions received by web size). This is a solution that is always preferable to building combined indicators from different sources (e.g., considering Google as a source for Google size and Majestic for mentions). But we must above all be cautious and treat the results with due consideration. The tests of independence have already shown that each search engine offers completely different results; the strong positive correlation indicates, in turn, that there is a certain concordance between search engines when identifying companies with a smaller or larger size (regardless of the value or order of magnitude), always within a margin or cluster of companies.
5.3
Mention
5.3.1 Data gathering The sources used for this category were Majestic and, especially, Twitter.
5.3.1.1 Majestic In the case of Majestic, the data were obtained automatically through the API (the same way as data size in the previous section) on December 26, 2016. There is a very wide variety of metrics related to hypertext mentions, although obviously
94
Cybermetric Techniques to Evaluate Organizations Using Web-Based Data
linked to the coverage of Majestic. The metrics collected (from the Fresh Index) were the following: 2 2 2 2
Total External Inlinks: Total number of links received by a website. Total Sites (Inlinks): Total number of websites from which at least one link has been received. Citation Flow: Index (0 to 100) that indicates the number of links received by a website. Trust Flow: Index (0 to 100) that indicates the quality of the links received by a website.
These metrics are based on hyperlinks and do not place any restrictions on their source, unlike selective metrics, which will be specifically addressed in the next chapter.
5.3.1.2 Twitter The collection of embedded hyperlinks in tweets has already been tested in cybermetrics (Orduna-Malea, Torres-Salinas, & Delgado Lo´pez-Co´zar, 2015; Vaughan, 2015), demonstrating that Twitter is a representative source of hyperlinks, at least in the case of world-class universities. In order to demonstrate that they can also be a representative source for companies, we retrieved the total number of hyperlinks from Twitter that pointed to each of the websites from our sample of biotech companies. To do this, we used the following query from Twitter’s advanced search interface: ,“www.domain-1.com” from:@account-1.com. The “from” element was used to remove self-mentions, i.e., links included in tweets published by the official account of the relevant company (@account-1). In this way, we made sure that all the hyperlinks were truly external (which is how Majestic functions automatically). Moreover, the quotation marks and the term “www” were also included in the query to eliminate data noise and boost the accuracy of the results. All the queries were performed manually on December 22, 2016; in this case, the number of tweets obtained for each of them were manually counted.
5.3.2 Results In the previous section on size we used different sources for the same indicator (size as an indicator of productivity); by contrast, now we have different metrics (from the same source or not) that are related to the same indicator (mention as an indicator of visibility). In total we calculated five mention metrics between the websites, with the aim of producing a varied sample. Table 5.4 is a correlation matrix (Spearman) with correlation coefficients (Rs) between the five mention metrics. The results confirm our hunch that there is a high positive correlation between virtually all the metrics that were calculated. The web domains that received the most external links are those that—generally—
Global performance on commercial search engines
Table 5.4
95
Correlation matrix between mention metrics
Total sites Total inlinks-1 Total inlinks-2 Trust flow Citation flow
Total sites
Total inlinks-1
1 a 0.72
1
a
a
1
a
a
a
a
a
a
0.58 0.64 0.73
0.44 0.45 0.61
Total inlinks-2
0.42 0.40
Trust flow
Citation flow
1 0.75
1
a
Total inlinks-1: Powered by Majestic; Total inlinks-2: Powered by Twitter. a Significant values at the level of significance α , 0.010 (two-tailed test).
receive links from a greater number of web spaces or those that have better Citation Flow scores. The Trust Flow values now, conversely, correlate to a lesser extent, which is logical if we think that a domain can receive many links, but not all will necessarily come from high-quality sources. So we may find exceptions (outliers) in the shape of companies that receive a high total amount of links, but, of these, few will be of sufficient quality (perhaps because of spam or due to the fact of not being able to create content that generates sufficient interest), thus causing the degree of correlation to suffer. Still, the number of total links and the Trust Flow values have a significantly moderate correlation (Rs 5 0.45). To conclude this section, we shall complete the information extracted from Twitter. The hyperlinks included in tweets directed at individual companies show a moderate, but significant, correlation with the total number of links that these companies receive (Rs 5 0.44). One of the most interesting aspects of the extraction of hyperlinks from Twitter is that the system is able to detect alias URLs. The limited space allowed in a tweet (140 characters), and the length of some URLs, has popularized the use of URL shorteners, like Google URL Shortener (https://goo.gl), Bitly (https://bitly.com), and TinyURL (tinyurl.com), among others. The Twitter search commands are able to detect the searched for URL even if a shortener has been used. Fig. 5.2 shows the example of Alexion Pharmaceuticals (,alexion. com.). The URL embedded in the second tweet directs the user directly to ,alexiom. com.. However, the others do so indirectly. The first tweet (,https://t.co/ z1Q1EMjHvC.) redirects to ,http://www.alexion.com/News/Featured-Stories/ First-Wave-Alexion.aspx., while the URL embedded in the third tweet of the figure (,https://t.co/FOi6loMMEf.) redirects to the following URL: http://www.alexion.com/patients/ultra-rare-diseases.aspx?utm_content 5 buffer1d2e8 &utm_medium 5 social&utm_source 5 facebook.com&utm_campaign 5 buffer This last case is interesting because it is a URL tagged using UTM (Urchin Traffic Monitor) parameters, through the Facebook shortener (,fb.me.), which
96
Cybermetric Techniques to Evaluate Organizations Using Web-Based Data
Figure 5.2 Searching for URLs embedded in Twitter. Source: Twitter.
directs to a page that does not currently exist (it was probably eliminated after an advertising campaign). The existence of hyperlinks to URLs that no longer exist is certainly a limitation for link analysis, but we consider it a lesser evil because it does in fact reflect the visibility that existed at some point in the past. This issue is addressed by Majestic through the provision of two databases, the Fresh Index and the Historic Index. The former only provides data for the last three months, while the latter gives the total available data, where there is a greater likelihood of finding broken links (directed to web pages that no longer exist or were simply eliminated at the source location).
5.4
Usage
The usage category is a challenge for cybermetrics since these data are often accessibleonly and exclusively on web analytics applications linked to certain websites and are therefore only available to users with access privileges. The inclusion of sensitive data (e.g., linked to e-commerce sales data) is an inducement for this structured and valuable information being stored in data silos that search engines are not able to reach. For this reason, it becomes necessary to rely on external applications to provide usage data, such as number of visits, page views, and time spent on the page. However, the quality of these data is often low and highly skewed. Platforms of note are Alexa (http://www.alexa.com), SEMrush (,https://www.semrush.com.), and SimilarWeb (,https://www.similarweb.com.). The added problem of these platforms is that they offer freemium services. This means that some data are offered for free but the most meaningful data (as well as features that allow a large number of URLs to be searched at once) require some kind of subscription, as this is the main business model of these services.
Global performance on commercial search engines
97
5.4.1 Data gathering We decided to use the SEMrush Traffic Analytics tool to capture available usage metrics (traffic rank, unique visitors, pages per visit, average visit duration, bounce rate, and traffic source). However, we only obtained data for six companies (Amgen, Biogen, Gilead, Illumina, QIAGEN, and Shire), at the time, the companies of the sample with the most web traffic. This perfectly illustrates another problem associated with capturing usage data through external platforms: Only a few companies in a sector (generally the largest or most active on the web) will have data in quantities that are sufficiently significant to appear in the usage statistics of these external applications. After ruling out SEMrush, we finally opted for Alexa and SimilarWeb for data extraction. All data were extracted on January 9, 2017.
5.4.1.1 Alexa Available usage data for each web domain were retrieved manually on the Traffic Statistics portal (,http://www.alexa.com/siteinfo/company.com.). The following data were extracted: 2 2 2
Daily Time on Site: Average value for dwell time on the website (measured in seconds) per visitor per day. Daily Pageviews per Visitor: Average number of page views per visitor per day. Bounce Rate (%): Percentage of visitors to a particular website who navigate away from the site after viewing only one page (no real interaction based on navigation).
These metrics are intended to capture the three main subcategories of usage metrics (visitors, visits, and interactions). Of these metrics, perhaps bounce rate has garnered the greatest fame in the world of web analytics (Mun˜oz & Elo´segui, 2011), because of its ability to reflect the level of interaction of users visiting a website. If a user accesses the web space of a company (from any page) and leaves the site without having browsed it or viewing any other content, we might infer from this that the content was not very interesting. However, if the visitor has viewed numerous pages and has spent a long time on each before leaving, perhaps the information provided on the page that he or she first reached (the landing page) was highly useful. Despite this overwhelming logic, we should proceed with caution as this metric “sees” the interaction exclusively through the page views; a user may have been interacting on the landing page, watching several videos, each of several minutes’ duration, or reading interesting but extensive content, so the bounce rate should be adjusted according to the characteristics of each page (Kaushik, 2009). This calls into question the validity of this metric when making comparisons between websites. Nevertheless, it is useful to capture the metric and to subsequently analyze it in order to obtain the maximum amount of possible data for each web domain in each of the categories of the analysis model. Chapter 9, The refinement, will, in fact, look at how to compare and combine all these dimensions.
98
Cybermetric Techniques to Evaluate Organizations Using Web-Based Data
5.4.1.2 SimilarWeb Data were captured from this source following a similar procedure to that used for Alexa, in this case, through the following URL: ,https://www.similarweb.com/ website/company.com.. Besides bounce rate, the following metrics were extracted: 2 2 2
Total Visits: Total number of visits to a website, including both desktop and mobile web data. Average Visit Duration: Average number of page views per visitor. Pages per Visit: Average number of page views per visit.
We can see that, as is the case with the nomenclature used to describe the hyperlinks between websites, there are no standardized forms when naming the same metrics. The Average Visit Duration of SimilarWeb and SEMrush corresponds “approximately” to Alexa’s Daily Time on Site; SimilarWeb’s Pages per Visit resembles Alexa’s Daily Pageviews per Visitor. But the differences can sometimes be subtle linguistic distinctions, which are, nevertheless, decisive in a web audience analysis (Bermejo, 2007). Visits and visitors are different dimensions because a visitor (user) can have multiple visits (sessions). Hence, the differences between Alexa data and SimilarWeb data concerning pages visited can be very pronounced, depending on whether the average is based on the number of visitors (Alexa) or visits (SimilarWeb). Therefore, it is important to know the exact procedures followed by each platform to capture the metrics that they provide. In the present case, it is crucial to understand that Alexa data refer to averages obtained during the last three available months while in the case of SimilarWeb the period extends to six months; so some metrics, despite measuring very similar things, may not be comparable at all. Consider, for example, that the period of time that SimilarWeb works with includes the Christmas holidays and that this is not the case with Alexa. The high seasonality of the data (in holiday season, visits to a web space may shoot up and then drop dramatically) is a key external variable to take into account when working with usage metrics.
5.4.2 Results The various nuances in play when calculating similar usage metrics between external platforms are quite evident in the data presented in Table 5.5, a Spearman correlation between three similar metrics offered by Alexa and SimilarWeb. While there is an understandable difference between SimilarWeb’s Pages per Visit and Alexa’s Daily Pageviews per Visitor (the difference between counting visits and visitors can be significant, as discussed above), data on the average duration of visits over time are entirely different (Rs 5 0.13; α , 0.1; P-value: 0.085). In this case, different data windows (three months for Alexa and six months for SimilarWeb) could affect the measurements. However, the most striking difference is undoubtedly the weak correlation between the bounce rate offered by each
Global performance on commercial search engines
99
Correlation between SimilarWeb and Alexa usage metrics
Table 5.5
SimilarWeb
Bounce rate (%) Pages/visit Average visit duration a
Alexa Bounce rate (%)
Daily pageviews/ visitor
Daily time on site
a
-a 0.27 --
--0.13
0.31 ---
Significant values at the level of significance α , 0.010 (two-tailed test).
Table 5.6
Correlation between SimilarWeb usage metrics
SimilarWeb
Total visits
Average visit duration
Total visits Average visit duration Pages/visit Bounce rate
1 0.02
1
a
2 0.05 2 0.01
Pages/ visit
a
1
a
a
0.67 2 0.19
2 0.42
Bounce rate
1
Significant values at the level of significance α , 0.010 (two-tailed test).
platform (Rs 5 0.31; α , 0.1; P-value , 0.0001). In this case perhaps the fact that SimilarWeb includes both desktop and mobile web data has been decisive, but we really cannot know for sure. Again, the different procedures of each platform are decisive. SimilarWeb indicates that its data come from four main sources: (1) A panel of monitored devices; (2) local internet service providers (ISPs) located in many different countries; (3) web crawlers that scan every public website to create a highly accurate map of the digital world; and (4) hundreds of thousands of direct measurement sources from websites and apps that are connected to SimilarWeb directly (https://www.similarweb.com/ourdata). Meanwhile, Alexa states that its data come from “a sample of millions of Internet users using one of over 25,000 different browser extensions. In addition, we gather much of our traffic data from direct sources in the form of sites that have chosen to install the Alexa script on their site and certify their metrics” (http://www.alexa.com/about). Since the only data relating to total visits have been collected by SimilarWeb, this source will be used to understand the relationships between the different usage metric categories. Table 5.6 reflects the almost nonexistent relationship between the total number of visits and the other parameters (length of visits, page views per visit, and bounce rate).
100
Cybermetric Techniques to Evaluate Organizations Using Web-Based Data
Sometimes common sense (perhaps the least common of all senses) should help us to understand the results. More visits are a sign that the created web resources have been utilized, but not necessarily a sign of visibility per se. For the same reason, the number of visits is framed within the category of usage as an indicator of consumption. Receiving many visits does not have to correspond to a long user dwell time on a website or the number of pages they might visit (which directly affects the bounce rate values). However, we did observe a relationship between the length of the visit and the number of page views. If, on average, more pages are visited, logically the average dwell time would have been longer in most cases (though not necessarily always). Similarly, bounce rate and the number of pages also have a greater relationship; in this case, negative, since the more pages that are visited, the lower the bounce rate will be. In any case, there is always the possibility of finding ourselves with websites, the performance of which, for one reason or another, is excessively high or low, thus affecting the statistical results (this statement applies to usage metrics of this section, the size and mention metrics seen in previous sections, or any other set of web data from any other category). Sometimes there is a tendency to eliminate the maximum and minimum value in statistical analyses in order to iron out the results, although this can be a dangerous procedure if not backed up by any statistical evidence. Taking the data on web usage collected from SimilarWeb, we applied the Dixon test to detect outliers among the 184 observations. Table 5.7 shows the results of the analysis, which reflect the existence of some websites with excessive values. The possible exclusion of these websites in the final analysis will depend largely on the objectives of the researcher or practitioner. The effects on the correlation matrices between indicators will be minimal, but in statistics of position they could be noteworthy. In this case, amongst the observations considered outliers, only one corresponds to a maximum value, namely, the number of total visits received by ,illumina. com., which belongs to the company Illumina (6,01,500 visits). Fig. 5.3 illustrates the distribution of outliers for the total visits calculated by SimilarWeb, in which, apart from Illumina, the companies that stand out are Shire, QIAGEN, and Amgen, with a number of visits substantially higher than the other biotechnology companies
Table 5.7
Correlation between SimilarWeb
Metric
Outliers
P-value
Maximum
Minimum
Total visits Average visit duration Pages/visit Bounce rate
7 10 11 11
0.004 0.502 0.170 0.809
Yes No No No
No No No No
The P-value has been computed using 1,000,000 Monte Carlo simulations. Significance level α , 0.05.
Global performance on commercial search engines
101
Figure 5.3 Biotechnology outliers in total visits from SimilarWeb (Dixon test).
in the sample. These performance figures may be due to the considerable reach of these companies or to a defect in the measurement tools. The fact that our analytical model is based on many different metrics and dimensions should help explain the existence of outliers in the case of certain metrics. If Illumina stands out just as evidently in many other metrics, we would be able to safely say that the number of visitors is a true reflection of its web performance, even if it occupies a different order of magnitude from the other companies in the sample.
5.5
Formal
After the quantitative metrics of size, mention, and usage, in this last section we are going to explore more qualitative metrics, even though they originate from quantitative data. To do this, we shall navigate through metrics derived from web usability and SEO.
5.5.1 Data gathering The sources used in this section are TAWDIS, PageSpeed Insights, and Google. We shall first look at the captured metrics and, secondly, the results obtained.
102
Cybermetric Techniques to Evaluate Organizations Using Web-Based Data
5.5.1.1 TAWDIS The first set of formal metrics were captured from the number of errors and warnings when applying an automatic web accessibility test, in this case known as TAW (Test de Accesibilidad Web, Web Accessibility Test), developed by the CTIC Foundation (http://www.fundacionctic.org) and available online for free (http:// www.tawdis.net). For Level A web accessibility conformance (WCAG 2.0), the least restrictive of the three levels (A, AA, and AAA), the total number of errors and warnings were collected for each of the 184 web domains of the sample of biotechnology companies. Data were collected during the last week of December 2016.
5.5.1.2 PageSpeed Insights Page load speed is a quality-related metric because load speeds that are too slow can create a poor browsing experience, causing users to abandon the site. This free source, designed by Google (https://developers.google.com/speed/pagespeed/ insights) is a page speed index that gives a score on a scale of 0 to 100, which indicates the degree of load speed optimization. The data were collected for each website of the sample, both for desktop and mobile versions, during the last week of December 2016.
5.5.1.3 Google Finally, SEO data were acquired. In this case, we associated the position that a company website occupied on a search engine results page, following a specific query, with an indicator of website optimization. If a company sells telephones, it would be strange if that company appeared among the top results for a search engine query such as “travel Las Vegas”. However, if the query was “buy telephone”, then that company appearing amongst the top results would be a true indication of quality (of the web space). Google—or any search engine—would be “seeing” the website in question not only as a place for telephones, but amongst such places, it would also be seeing it as a leader, such that it will list it in a privileged position; this is a prime consideration if we consider the fact that users rarely go beyond the first results page when conducting any web search. Following this logic, we calculated the position of companies on Google’s results page for a particular query, in this case for the first word in the company name (e.g., “Acadia” instead of “Acadia Pharmaceuticals”). In this way we were able to test the ability of the company website to situate itself at the top of the results page when a user searches for the term that corresponds to the name of the company, or at least a part of it. As with the other formal metrics, data were collected during the last week of December 2016, in order to have a pool of data gathered in the same timeframe.
Global performance on commercial search engines
103
5.5.2 Results Since we are now handling metrics that are more aligned with qualitative aspects, we decided to conduct a brief experiment, which consisted of categorizing the page speed scores. Although PageSpeed Insights already has a qualitative process whereby it transforms a quantitative variable (seconds) into a bounded index (0 to 100), we went a step further by qualitatively categorizing this range. Scores of 0 to 49 were categorized as “fail”, 50 to 69 as “sufficient”, 70 to 89 as “good”, 90 to 99 as “very good”, and finally “excellent” was reserved for the maximum score of 100. In this way we managed to establish categories associated with a measurable indicator of quality. Table 5.8 summarizes the obtained data for both desktop and mobile versions. The results shown in Table 5.8 clearly indicate several issues. Firstly, the “excellent” score is difficult to attain; this is made clear by the fact that no website achieved this score on any of the platforms. Secondly, the page load speed on the desktop version is, on average, higher than for the mobile version. We only need to look at the number of websites that scored at least one “good” in desktop mode (80) and mobile mode (30). Moreover, none of the analyzed companies achieved an “excellent” score, with a modest average of 64.5 points for desktop versions and barely a “sufficient” (54.8) in the case of mobile versions. The company that achieved the best results was Ardelyx (95 points for both mobile and desktop versions). Yet again we must be cautious. A website that loads quickly is positive, but it is not the same to have a website that contains three images and two paragraphs as it is to have a mega-site with endless lines of text, millions of videos, and loaded with graphics. These data cannot be used without context or without being combined with other metrics. However, the page load speed is based—fortunately for our purposes—on the homepage. While the analysis could be applied to any individual page, carrying it out on each and every one of the pages of a website is, in practice, unnecessary (and unworkable in terms of time and money). The analyses should be conducted primarily on pages designed as specific landing pages (the homepage or any other page, designed to provide certain content or specific services, to which users are redirected). It is on these pages where a bad user experience can be critical to the image of the company. Once again in our journey through web metrics, we are
Table 5.8
Page speed of biotech company websites
Type
1 N
Desk Mobile
%
2 N
%
3 N
%
4
5
N %
N %
37 20.1 62 33.7 74 40.2 6 55 29.9 95 51.6 27 14.7 3
3.3 0 1.6 0
Average Median
0 0
64.5 54.8
68.0 56.5
(1): Fail (0 2 49); (2): Sufficient (50 2 69); (3): Good (70 2 89); (4): Very good (90 2 99); (5): Excellent (100).
104
Cybermetric Techniques to Evaluate Organizations Using Web-Based Data
going to have to deal with unequal distribution; a few pages will influence the perception and satisfaction of users while most will go unnoticed or will have zero effect). Indeed, it is very interesting to observe the effects of some formal metrics on others. For this reason, we combined page speed metrics with the number of errors and warnings identified by the TAW analysis. Fig. 5.4 shows the average number of errors and warnings based on the qualitative level obtained in the page speed test for desktop platforms (fail, sufficient, good, very good). It is no coincidence that, on average, the number of web usability errors or warnings are higher for sites with low speed ratings (35.4 errors/warnings for websites with a page speed score higher than 50 points) and vice versa (10.4 errors/warnings for websites with a page speed higher than 90 points). Obviously, the number of websites with a rating of “very good” is lower (6) than for those with a “fail” (37). Nevertheless, these data, averaged out, indicate a trend that should be borne in mind: Poorly built websites generate a number of problems (including usability problems or slow load speeds) that can affect the perception of the web content quality,
Figure 5.4 Relationship between accessibility errors and page speed.
Global performance on commercial search engines
105
5.9 Optimization of biotech companies in Google (title query)
Table
Serp(r)
N
1 2 3 4 5 6 7 8 9 10 10 1 Total
121 4 9 2 4 2 2 4 1 0 35 184
not only amongst end users, but also for search engines, which penalize these websites with a lower position on the search engine results pages. On this last point, Table 5.9 shows the results for the simple SEO analysis that we performed for our sample of biotech companies. We can see that the results were, in principle, extremely positive. In total 65.8% of the 184 websites appear in first position on the search engine results page or SERP, while only 19% were left out of the top 10 results. Continuing with our intention to plunge deeper into the results to extract information that takes us beyond raw data, we need to take into account a number of more qualitative circumstances. Our queries searched for the first name of the company. It is logical to assume that if we write “Eagle Pharmaceuticals”, the first result will correspond to this company (if not we would have a problem on our hands). It is another matter entirely if the user just types “Eagle”, expecting the same result. Similarly, it is not the same when the first term is “Five” or when it is “Zogenix”. For this reason, it is highly noteworthy that Array BioPharma appears at position 39 for the search-term “Array”, or The Medicines Company comes in twelfth when we search for “Medicines”, even though the two companies ended up outside the top 10 results. In short, to contextualize the SEO results, we must always consider not only the term that we used to search for the results, but also the nature of the company name itself. An interesting exercise would consist in finding out how the companies in our sample rank on the Google search results page for a generic, but highly relevant, query, “biotechnology”: 1. bio.org 2. en.wikipedia.org/wiki/Biotechnology 3. nature.com
106
4. 5. 6. 7. 8. 9. 10.
Cybermetric Techniques to Evaluate Organizations Using Web-Based Data
khanacademy.org journals.elsevier.com ncbiotech.org cell.com sciencedirect.com merriam-webster.com unitybiotechnology.com
If we look at the top 10 results, we can see that none of the companies in the sample occupies these privileged positions; instead, the top-ranked hits are for scientific journals (Cell, Nature), publishing platforms (Elsevier), reference works (Wikipedia, Merriam-Webster), organizations (Biotechnology Innovation Organization), or research centers (North Carolina Biotechnology Center).
References Aguillo, I. F., Ortega, J. L., & Granadino, B. (2006). Contenidos del buscador Google. Distribucio´n por paı´ses, dominios e idiomas. El profesional de la informacio´n, 15(5), 384 389. Available from http://dx.doi.org/10.3145/epi.2006.sep.07. Anderson, C. (2006). The long tail: Why the future of business is selling less of more. New York: Hachette Books. Baraba´si, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509 512. Available from http://dx.doi.org/10.1126/science.286.5439.509. Bar-Ilan, J. (1999). Search engine results over time: A case study on search engine stability. Cybermetrics, 2(3), 1 3. Bar-Ilan, J. (2001). Data collection methods on the Web for infometric purposes—A review and analysis. Scientometrics, 50(1), 7 32. Available from http://dx.doi.org/10.1023/ a:1005682102768. Bar-Ilan, J. (2002). Methods for measuring search engine performance over time. Journal of the American Society for Information Science and Technology, 53(4), 308 319. Available from http://dx.doi.org/10.1002/asi.10047. Bermejo, F. (2007). The internet audience: Constitution & measurement. New York: Peter Lang Publishing. Hellsten, I., Leydesdorff, L., & Wouters, P. (2006). Multiple presents: How search engines rewrite the past. New Media & Society, 8(6), 901 924. Available from http://dx.doi.org/ 10.1177/1461444806069648. Holmberg, K. (2016). Altmetrics for information professionals: Past, present and future. Oxford: Chandos Publishing. Jansen, B. J. ,, & Molina, P. R. (2006). The effectiveness of Web search engines for retrieving relevant ecommerce links. Information Processing & Management, 42(4), 1075 1098. Available from http://dx.doi.org/10.1016/j.ipm.2005.09.003. Kaushik, A. (2009). Web Analytics 2.0: The art of online accountability & Science of Customer Centricity. Indianapolis: Wiley. Lewandowski, D., Wahlig, H., & Meyer-Bautor, G. (2006). The freshness of web search engine databases. Journal of Information Science, 32(2), 131 148. Available from http://dx.doi.org/10.1177/0165551506062326.
Global performance on commercial search engines
107
Mun˜oz, G., & Elo´segui, T. (2011). El arte de medir. Manual de analı´tica web. Barcelona: Bresca (Profit editorial). Orduna-Malea, E., & Aguillo, I.F. (2014). Cibermetrı´a: midiendo el espacio red. Barcelona: Editorial UOC. Orduna-Malea, E., Torres-Salinas, D., & Delgado Lo´pez-Co´zar, E. (2015). Hyperlinks embedded in twitter as a proxy for total external in-links to international university websites. Journal of the Association for Information Science and Technology, 66(7), 1447 1462. Available from http://dx.doi.org/10.1002/asi.23291. Ortega, J. L. (2014). Academic search engines: A quantitative outlook. Oxford: Chandos Publishing. Ortega, J. L. (2016). Social network sites for scientists: A quantitative survey. Cambridge, MA: Chandos Publishing. Pirkola, A. (2009). The Effectiveness of Web Search Engines to Index New Sites from Different Countries. Information Research: An International Electronic Journal, 14(2), 396. ,http://www.informationr.net/ir/14-2/paper396.html. Accessed on 18.03.17 Thelwall, M. (2001). The responsiveness of search engine indexes. Cybermetrics, 5(1), 8. Thelwall, M. (2008a). Extracting accurate and complete results from search engines: Case study Windows Live. Journal of the American Society for Information Science and Technology, 59(1), 38 50. Available from http://dx.doi.org/10.1002/asi.20704. Thelwall, M. (2008b). Quantitative comparisons of search engine results. Journal of the American Society for Information Science and Technology, 59(11), 1702 1710. Available from http://dx.doi.org/10.1002/asi.20834. Thelwall, M. (2010). Webometrics: Emergent or doomed? Information Research: An International Electronic Journal, 15(4), 28. ,http://www.informationr.net/ir/15-4/ colis713.html. Accessed on 18.03.17 Thelwall, M. (2014). A brief history of altmetrics. Research Trends, 37, 3 4. Thelwall, M., & Sud, P. (2011). A comparison of methods for collecting web citation data for academic organizations. Journal of the American Society for Information Science and Technology, 62(8), 1488 1497. Available from http://dx.doi.org/10.1002/asi.21571. Uyar, A. (2009). Investigation of the accuracy of search engine hit counts. Journal of Information Science, 35(4), 469 480. Available from http://dx.doi.org/10.1177/ 0165551509103598. Van Couvering, E. (2010). Search engine bias: The structuration of traffic on the WorldWide Web. Doctoral dissertation, The London School of Economics and Political Science, University of London, UK. Vaughan, L. (2015). Uncovering information from social media hyperlinks: An investigation of twitter. Journal of the Association for Information Science and Technology, 67(5), 1105 1120. Available from http://dx.doi.org/10.1002/asi.23486. Vaughan, L., & Thelwall, M. (2004). Search engine coverage bias: Evidence and possible causes. Information Processing & Management, 40(4), 693 707. Available from http:// dx.doi.org/10.1016/s0306-4573(03)00063-3. Vaughan, L., & Zhang, Y. (2007). Equal representation by search engines? A comparison of websites across countries and domains. Journal of Computer-Mediated Communication, 12(3), 888 909. Available from http://dx.doi.org/10.1111/j.1083-6101.2007.00355.x. Wilkinson, D., & Thelwall, M. (2013). Search markets and search results: The case of Bing. Library & Information Science Research, 35(4), 318 325. Available from http://dx.doi. org/10.1016/j.lisr.2013.04.006.