Library and Information Science Research 41 (2019) 109–122
Contents lists available at ScienceDirect
Library and Information Science Research journal homepage: www.elsevier.com/locate/lisres
State of the art review
Global perspectives of research data sharing: A systematic literature review Winner Dominic Chawinga , Sandy Zinn ⁎
T
Department of Library & Information Science, University of the Western Cape, Private Bag X17, Bellville 7535, Cape Town, South Africa
ABSTRACT
Studies investigating data sharing from a world perspective are seemingly rare. By employing a quantitative design,this systematic review investigates and presents a comprehensive account of factors hampering data sharing at three levels of the global research hierarchy (individual, institutional and international). The study analyses secondary data extracted from 105 publications (n=105). Journal publishers and research grant organisations are key players in promoting data sharing activities by formulating, adopting and implementing policies on data sharing. Despite concerted efforts to promote data sharing, various factors frustrate these initiatives; they include lack of time and data misappropriation (individual level); data sharing training, absence of compensation and unfavourable internal policies (institutional level); and weak policies, ethical and legal norms, lack of data infrastructure and interoperability issues (international level). To counter these challenges, there is a need for research stakeholders to recognise researchers who share data through data citations, acknowledgement and incentives; invest in infrastructure, conduct training and advocacy programs; formulate stringent and fair policies. Data sharing will only become a success if research stakeholders apply equal efforts in managing data to that of research publications in general. The study offers a unique and comprehensive account of factors hampering data sharing from a global perspective. Solutions suggested could be adopted by research stakeholders in their efforts to enhance data sharing activities at various research levels.
1. Introduction Each time an editor communicates a congratulatory message to researchers about their accepted manuscript, they feel grateful while looking forward to seeing their scientific paper published and shared with peers in their field of study. However, Wicherts and Bakker (2012, p.73) ask an important question worth reflecting upon: “So why not go all the way, and publish your raw data too?” Proponents of data sharing vigorously argue that data sharing is inevitable in a modern world where part of new knowledge is discovered by re-analysing previously generated data. For example, Woolfrey (2009) argues that it is important to make data open so that some groups of researchers such as students, who may struggle to obtain important survey findings through informal channels, can re-use such data. Although research data has been considered a by-product of research activities whilst research papers or reports have been regarded as primary products, research stakeholders are increasingly attaching value to research data. The reason is that in recent years it has been proven that research data is a commodity worth managing and preserving at disciplinary and national level (Cragin, Palmer, Carlson, & Witt, 2010; Davenport & Patil, 2012; Matlatse, 2016; Wicherts & Bakker, 2012). Data is described by Wallis, Rolando, and Borgman (2013) as the cornerstone of science. These factors have been the foundation for driving a paradigm shift in the way research data is viewed, and have consequently led to an upsurge of campaigns to make research data freely accessible to the general public. The concept of sharing research data has attracted
⁎
unprecedented interest and support from various research stakeholders such as research funders, journal publishers, and some scientists who have formed joint forces to turn the world into a data sharing entity (Fecher, Friesike, & Hebing, 2015; Guedon, 2015; Matlatse, 2016; Wallis et al., 2013). The data sharing campaigners are achieving their intended goals, but not without some frustrations. The concept of data sharing is complex as it operates amidst an interplay of factors on a continuum where some factors create a flourishing data sharing environment while others act in opposition. Taking into consideration that the concept of data sharing is multifaceted, it becomes important to conduct a comprehensive literature review with the aim of identifying solutions that could best be used to deal with challenges that undermine efforts to popularise, embrace, and implement the concept of data sharing. Hence, the current study. 1.1. Problem statement The literature shows that previous studies have focused on research data management in general which encompasses various concepts that include data creation, data preservation, data infrastructure, data sharing, data re-use, and data management skills (Chen & Wu, 2017; Housewright, Schonfeld, & Wulfson, 2013; Koopman, 2015; Parr, 2007; Sansone & Rocca-Serra, 2012; Schumacher & VandeCreek, 2015; Scott, 2014; Shakeri, 2013; Teeters, Harris, Millman, Olshausen, & Sommer, 2008). Although it is scientifically feasible, studying the concept of research data management as a composite concept implies that most of
Corresponding author. E-mail address:
[email protected] (W.D. Chawinga).
https://doi.org/10.1016/j.lisr.2019.04.004 Received 16 October 2018; Received in revised form 2 April 2019; Accepted 15 April 2019 Available online 04 June 2019 0740-8188/ © 2019 Elsevier Inc. All rights reserved.
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn
the specific elements, such as data sharing, are investigated and understood superficially. Data sharing is at the centre of all research data management activities (Wallis et al., 2013). Since issues involving data sharing are not thoroughly investigated and understood, pursuing this concept is becoming increasingly problematic. The implication is that all efforts and resources channelled towards research data management activities may not achieve the intended goals. It is in this vein that, unlike many prior studies that focused on research data management in general, this study sets out to investigate one component that of data sharing. The focus is on investigating and finding solutions to challenges that stymie data sharing. In addition, a larger body of literature seems to suggest that the success of data sharing is dependent on the aggressiveness of journal publishers and research grant organisations in imposing conditions on researchers to share data (Charbonneau, 2013; Chen & Wu, 2017, p. 346; Davenport & Patil, 2012; Enke et al., 2012; Fecher et al., 2015; Guedon, 2015; Matlatse, 2016; Wallis et al., 2013). Although this strategy seems practical and it is achieving the intended goal, it may scare and frustrate some researchers from sharing their data. In terms of research sites, prior similar studies have focused on particular countries or regions thereby providing country and regional specific solutions to challenges affecting data sharing. Other studies have also focussed on investigating particular disciplines. Investment in a discipline specific data sharing approach or central repository comes with its strengths and weaknesses. One key strength is that individual institutions or departments/sections maintain control and ownership over the data, implying the risk of losing such data due to external factors is low. On the other hand, the risk of losing data maintained by a community-driven approach could be high if institutions or departments have little control over their data; external forces beyond the control of the local institutions could lead to permanent loss of data. However, there are challenges associated with a discipline specific approach. A discipline specific data sharing approach is expensive in terms of both staffing and technology and it fails to lend itself to geographical distribution; its data is restricted to users affiliated to the institution or department. Moreover, Walters and Skinner (2011) warn that a silo-based approach to research data management is neither cost-effective nor sustainable when compared to a community-driven or multi-institutional approach. The current study extends the contribution of prior studies in two ways. Firstly, the study enriches the literature in this research domain by offering insights about data sharing at individual, institutional, and international levels which are seemingly lacking in existing literature. Secondly, the study synthesises and proposes interventions that research stakeholders such as research grant institutions, government agencies, publishers, librarians, and scientists can adopt to improve data sharing at individual, institutional, and international research levels.
contribution that data sharing and re-use could make to the advancement of science (Fienberg, Martin, & Straf, 1985; Glaeser, 1990). The evolution of information and communication technologies has escalated the pace and magnitude of these debates in which proponents are currently enjoying unprecedented support from key research stakeholders. These technologies have extensively impacted the manner in which research data is generated, stored, and shared amongst researchers thereby stirring new perceptions of data sharing (Corti, Van den Eynden, Bishop, & Woollard, 2014; Kambatla, Kollias, Kumar, & Grama, 2014; Pryor, 2012; Wallis et al., 2013; Yoon, 2015). 2.2. Factors propelling data sharing As pointed out earlier, pro-data sharing campaigners have emerged in Europe, Africa, Asia, and other parts of the world. Leading the campaign of data sharing are mega national, regional, and international organisations that are key funders of research projects. For instance, the National Science Foundation of the United States of America (USA) adopted a data sharing policy a long time ago demanding that researchers and research institutions receiving its funding deposit research data in open access repositories (Cohn, 2012). Similarly, the European Union (EU), a major research grant provider across the globe, demanded that from 2014 all data generated from research projects it funds should be accessible to the public for free (European Commission, 2012; Fecher et al., 2015). In Africa, Koopman (2015), Chiware and Mathe (2016, p.2), and Matlatse (2016) report that the National Research Foundation of South Africa, which is the key financier of research projects in South Africa and in some parts of Southern Africa, demands that all research activities it funds should have its resultant data publicly accessible for re-use purposes without imposing any embargoes. Creditable journal publishers have joined forces with research funders in developing, adopting and supporting policies that compel researchers to share research data (Doorn, Dillo, & Van Horik, 2013; Ross, 2016). For example, Atmospheric Chemistry and Physics, F1000Research, Nature, and PLOS One have put in place policies that require researchers to submit their manuscripts alongside their research data (Fecher et al., 2015). Such overwhelming interest from key research stakeholders is enough to signal that new thinking in the way research data is handled and shared is not just beckoning but is a reality that research stakeholders have to embrace. 2.3. Challenges Despite the debates involving data sharing being in favour of the proponents, progress has been compounded by many unresolved factors. Some studies (Fecher et al., 2015) have shown that, despite influences from policy makers, research funding bodies, and publishers of scientific scholarly works, researchers are somewhat unwilling to make their research data available to other researchers. A landmark study by Tenopir et al. (2011) which investigated research data sharing practices amongst 1329 researchers revealed that, for some reason, scientists do not make their research data electronically available to other researchers. Similarly, Rowhani-Farid and Barnett (2016) found that, from a total of 160 articles that were published in the British Medical Journal between 2009 and 2015, only three were published alongside their data. More importantly, although the value of sharing research data is highly recognised, “factors that hinder or prompt the reuse of data remain poorly understood” (Curty, Crowston, Specht, Grant, & Dalton, 2017, p.1). In Africa, a study by Anane-Sarpong et al. (2017) about data sharing practices in health sciences observes that data sharing is slow and unsatisfactory compounded by financial constraints. Other challenges confronting data sharing in Africa include lack of data sharing skills, poor or absence of data sharing policies, and poor data infrastructure (Chigwada, Chiparausha, & Kasiroori, 2017; Chiware & Mathe, 2016, p.2; Koopman, 2015; Matlatse, 2016; Ng'eno, 2018). It is
2. Literature review 2.1. The concept of data sharing: definition and history Data sharing, interchangeably called open data, can be explained as a deliberate effort to make all raw research data fully available for public access (Dong & Li, 2016; Kaye et al., 2018; Ross, 2016; Schmidt, Gemeinholzer, & Treloar, 2016). Data should be made publicly accessible for the purpose of creating transparency, reproducibility, and driving further scientific discovery (Dong & Li, 2016; Kaye et al., 2018; Pisani & AbouZahr, 2010; Ross, 2016; Rowhani-Farid, Allen, & Barnett, 2017; Royal Society, 2012; Schmidt et al., 2016; Watson, 2015). In other words, data sharing can conclusively be defined as the disposition and preservation of data for public access with the purpose of providing access for reuse. Debates regarding data sharing and re-use are not new; however, they have become more intense in the 21st century. These debates first emerged in 1980s when researchers became more concerned about the 110
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn
against this background that the study is conducted to identify strategies for strengthening research data sharing at global level. 3. Methods A systematic literature review is defined by Cook, Mulrow, and Haynes (1997) as a form of literature review that assembles, critically appraises or evaluates, and synthesises the results of primary studies in an integrative approach. As in Ullah and Ameen (2018), this study used a quantitative research approach to analyse secondary data synthesised across various primary studies. McKibbon (2006, p. 208) states that after presenting a problem, in any systematic literature review, other steps that need to be followed include “the identification of potential studies or data sources, selection of studies/sources, data extraction, combining and analyzing the data, and presentation of the findings.” 3.1. Selection of studies/sources (inclusion and exclusion criteria) Cook et al. (1997) and Matteson, Salamon, and Brewster (2011) emphasise the need to specify inclusion and exclusion criteria for selecting articles that are analysed in the review. McKibbon (2006, p. 204) mentions additional inclusion criteria such as terms, years, language restrictions, and other limits. To reduce personal bias and improve consistency and credibility of the findings, most of the inclusion and exclusion strategies highlighted by Matteson et al. (2011) and McKibbon (2006, p. 204) were applied in this study. The following inclusion criteria were employed:
Fig. 1. PRISMA flow diagram (Moher, Liberati, Tetzlaff, & Altman, 2009:267) reprinted with permission from the authors.
3.2. Search strategy According to Brettle (2009), search strategies commonly include searching various databases using a variety of search strategies such as tracking citations from relevant documents and asking experts for recommendations. In the current study, two fundamental search criteria were employed, and they include the formulation of key search terms (McKibbon, 2006, p. 204) and citation tracking (Brettle, 2009, p. 45). The search statements were constructed using Boolean logic as follows: “research data sharing AND researchers”; “digital curation AND data sharing”; “research data sharing AND open science”; and “research data sharing AND open data”. These search statements were used to search articles in well-known databases that included ScienceDirect, Taylor & Francis, Springer (BioMed Central), Sage, Wiley Library Online, PLOS One, and Google Scholar. Through citation tracking, articles from other databases were retrieved. A number of citations for each source was computed from Google Scholar. Based on the inclusion and exclusion criteria, Boolean search statements, and citation tracking, 105 sources were finally included for analysis as can be seen in Fig. 1.
• Journal articles, case reports, theses/dissertations, and books which reported on data sharing • Empirical scholarly documents which used quantitative design or qualitative design or both • Materials written in the English language • Materials published in 2007 or later The follow exclusion criteria were employed:
• Conference papers, letters, editorials, book reviews, research notes, and short communications • Articles not published in the English language • Publications earlier than 2007 The authors implemented inclusion and exclusion criteria following a widely held view that “by predefining and adhering to the selection criteria, bias in choosing studies for inclusion is minimised” (McKibbon, 2006, p. 210). The researchers chose the year 2007 as a baseline for two key reasons. A report prepared by the Association of Research Libraries as commissioned by the National Science Foundation of the USA overwhelmingly recommended research stakeholders to find better ways of managing and making research data accessible to potential users (Friedlander & Adler, 2006). It was further observed by Heidorn (2008) that by 2007, there seemed to be uncoordinated management and sharing of research data resulting from data production projects as well as medium and small projects which were funded by public funds. The red flag raised by Heidorn (2008) was followed by influential research grants organisations enacting policies that required funded researchers to make their research data resulting from their projects freely accessible. Examples of these organisations include the National Science Foundation of the USA (Cohn, 2012; Heidorn, 2011), National Research Foundation of South Africa (Koopman, 2015; Chiware & Mathe, 2016, p. 2; Matlatse, 2016) and the European Union (Fecher et al., 2015). It was after this period that journal publishers contemplated adopting data sharing polices to compel researchers to share their data (Fecher et al., 2015).
3.3. Data extraction and analysis This process involved developing a protocol commonly called a data form (McKibbon, 2006, p. 211) containing predetermined elements that was used to capture categories or themes identified from the 105 studies that were included for analysis. The protocol had four key categories that included the benefits of data sharing, data sharing at individual level, data sharing at institutional level, and data sharing at international level. During the preliminary literature review, the researchers observed a lack of literature on data sharing on the three levels; most studies are conducted within a country or regional context. Observing this gap in the literature, the researchers decided to address data sharing on these three levels. Each of these categories contained two sub-categories which included motivating factors for data sharing and challenges affecting data sharing. To develop these sub-themes and the main theme of benefits of data sharing (which was one of the four key themes), the researchers randomly sampled a few papers from the 105, scanned through them to identify commonly occurring themes which were then used to structure the protocol. Taking a leaf out of the highly cited paper about writing systematic 111
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn
Table 1 Data extraction for each study. Data description
SID Authors Publication date Study title Type of paper Institutional/national/regional/ international Research topics/themes Theory/models/conceptual framework Methodology Contexts/sectors Additional notes
A unique identity for each paper Names of all the authors The year the paper was published (2007–2018) The name of the paper, appearing in the searching stage Book chapter, journal, theses or workshop article The countries covered by the primary studies The description of the study topic/themes, such as benefits of data sharing, factors et cetera The theories adopted by the papers – data curation models The use of quantitative, qualitative, or mixed methodologies Description of the study area, in either research institutions or academic settings Important information which did not fit into any of the above attributes
Number of ar!cles & cita!ons (%)
Attribute of data
Ar!cles 35 30 25 20 15 10 5 0
Cita!ons
30.2% 19.8% 8.6
12.4%
11.4%
10.5%
6.4%
5.7%
17.1% 18%
14.7% 13.3% 5%
0.7% Taylor & Francis
Plos One
ScienceDirect
Sage
Willey Online Library
Springer
Google Scholar
Sources of ar!cles
Fig. 2. Distribution of articles by sources and their citations (n = 83). Note. Percentages do not add up to 100 because they are calculated against the total number of publications (n = 105) and the total number of citations (n = 5028).
literature reviews (Dubé & Paré, 2003, p. 605), five articles were randomly selected and jointly coded by the authors. Furthermore, learning from Dubé and Paré (2003, p. 605) as a strategy to improve the clarity and precision of the coding system, another 20 papers - excluding the initial five (implying that a sample size of 25 was used) were randomly selected, scanned, and coded independently. Common themes emanating from each of the 20 papers were then used to make adjustments and improvements to the protocol's coding scheme resulting from the initial five papers. In line with Dubé and Paré (2003), Kitchenham (2007), McKibbon (2006, p. 211), and Okoli and Schabram (2010), the authors discussed and reconciled all disagreements regarding the attributes and upon reaching a consensus, adjustments were made to the initial coding scheme. Table 1 presents an abridged data extraction protocol displaying 10 key attributes, and with it, the researchers accurately recorded all information from the 105 studies. To extract relevant data for analysis, each study was scanned by researchers and relevant information was extracted using Microsoft Excel spreadsheets, a technique successfully used previously by other studies (Ahmed, Ahmad, Ahmad, & Zakaria, 2018) in conducting a systematic literature review. It should be acknowledged that most factors affecting data sharing at all levels are conceivably interrelated; the only difference is the breadth of their effect. In making decisions about which factor belongs to an individual, institutional, or international grouping, the researchers considered the apparent breadth of the effect. For instance, it is apparent that lack of skills affects data sharing at individual level, but the factor was promoted to and discussed at institutional level considering that individuals are subsets of institutions. Using a protocol was necessary for consistency in extracting data across the sources thereby reducing personal bias and improving the objectivity of the findings on data sharing. The study investigated data sharing at the three levels because it was envisaged that although interrelated, factors that stimulate or discourage data sharing at each level are somewhat unique, hence the need to find solutions that best deal with challenges encountered at each level. Data is presented
numerically using frequencies and percentages. Microsoft Excel was used to code, compute frequencies and percentages, and to present findings in figures and tables. Themes reflected in the results section are discussed in an integrated way (Cook et al., 1997) in the discussion section. 4. Results Results on data sharing were searched and retrieved from the following academic databases: ScienceDirect, Sage Journals Online, Springer, Wiley Online Library, and PLOS One. Additionally, the commonly used academic search engine, Google Scholar was also employed. Of the 105 publications included for synthesis, 83 (79%) were peer reviewed journal articles, 10 (9.5%) were theses, eight (7.6%) were reports, and four (3.8%) were books. 4.1. Sources and citations Fig. 2 illustrates the annual number of published articles on research data sharing and citations from each database and Fig. 3 depicts the number of non-article publications which include theses, books, and reports and their citations. It has to be mentioned that percentages in both Fig. 2 and Fig. 3 do not individually add up to 100 because they are calculated against the total number of publications (n = 105) and the total number of citations (n = 5028). There are two key results. Firstly, results show that Springer, Wiley Online Library, Sage, and ScienceDirect have the highest number of publications with scores of 14 (13.3%), 13 (12.4%), 12 (11.4%), and 11(10.5%) respectively. Secondly, results show that databases that contributed the most citations include PLOS One, ScienceDirect, and Wiley Online Library with scores of 1519 (30.2%), 998 (19.8%), and 737 (14.7%) in that order. PLOS One journals have lower numbers of publications (nine or 8.6%) but are highly cited totalling 1519 (30.2%). In total, the 105 articles had been cited 5028 times. These results suggest that a large number of articles were retrieved from major subscription databases including Springer, 112
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn Formats (theses, books & reports)
Number of formats & cita!ons (%)
10
Cita!ons
2
2
9.5%
3 Germany
7.6%
8
4
Netherlands 5
6
Canada 3.9%
4
9
44
3.2% 1.8%
South Africa China
2 0.2%
10
0 Theses
Books
Reports
Other UK
Formats (theses, books & reports)
General
Fig. 3. Distribution of formats (theses, books and reports) and their citations (n = 22). Note. Percentages do not add up to 100 because they are calculated against the total number of publications (n = 105) and the total number of citations (n = 5028).
26
USA
Fig. 5. Data sharing publications by location (n = 105).
categorised as general. Countries under the general category include those that covered Egypt, Jordan, and Saudi Arabia; 14 countries in sub-Saharan Africa, India, Bangladesh, Thailand, Vietnam, Indonesia, and Papua New Guinea; Spain and Australia; Canada and USA; and Ghana and Tanzania. Many other papers falling under the general category were systematic literature reviews but, with different scopes from the current review, it was impossible to associate them with particular regions or countries. Countries forming the other category include Switzerland, Australia, Japan, Thailand, France, Kenya, Australia, Zimbabwe, and India. Considering that English was one of the inclusion criteria, it is not surprising that most publications were from the USA and the UK where English is a native language. The USA contributed nearly half of the publications followed by the UK, South Africa, Canada, Germany, and the Netherlands.
Wiley Online Library, Sage, and ScienceDirect. Although fewer articles were published in PLOS One which is an Open Access database, it had much higher numbers of citations. For instance, a publication by Tenopir et al. (2011) published in PLOS One had 692 (13.8%) citations. However, some individual papers published in subscription databases were also highly cited. An example is an article by Borgman (2012) published in Wiley Online Library which had a score of 549 (10.9%) citations. 4.2. Overall growth Fig. 4 indicates a steady growth in the annual number of publications on data sharing from 2007 to 2018 reaching a peak in 2012 and 2013. Since the study excluded publications before 2007, it is not possible to comment on the status of data sharing during the period of 2006 and earlier. Generally, from 2007 to 2018, 105 articles were published representing an approximate annual growth rate of 10 publications. Most articles were published in 2012 and 2013 where the number of publications reached 14 each.
4.4. Benefits of sharing research data A number of studies reported on the benefits that the research community can accrue from data sharing. Six benefits were identified from the literature. Nearly all articles, that is 104 (99%), highlighted that data sharing advances science, 96 (91.4%) indicated that data sharing minimises research costs, 69 (65.7%) publications indicated that data sharing guards against research fraud, 40 (38.1%) reported that data sharing helps in policy formulation, and finally 30 (28.7%) articles reported that data sharing helps to save time on the part of researchers. Results show that data is an academic asset which contributes to advancement of science. In this context, data collected by researchers or research teams can be re-analysed and interpreted by other scientists using various approaches or techniques to discover new
4.3. Research regions The study examined publications based on regions or locations of publication, and findings are presented in Fig. 5. Specifically, the USA had 44 (41.9%), the United Kingdom (UK) had 10 (9.5%), China had five (4.8%), South Africa had four (3.8%), Canada had three (2.9%), and Germany and the Netherlands had two (1.9%) publications each. Nine (8.6%) publications were categorised as other and consisted of countries which had only one publication each, and finally there were 26 (24.8%) publications that covered more than one country
Fig. 4. Number of data sharing publications 2007-July 2018 (n = 105). 113
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn
knowledge or breakthroughs. In this way, data sharing allows reproducibility of results (Elsayed & Saleh, 2018; Fecher et al., 2015; Fry, Lockers, Oppenheim, Houghton, & Rasmussen, 2008; McCullough, 2009; Pisani & AbouZahr, 2010; Watson, 2015). Results show further that data sharing helped to guard against research fraud. In this case, research fraud may come in different forms, but one obvious form involves falsifying methodologies. Data sharing can curb this malpractice by allowing independent researchers to re-analyse the data in order to confirm the validity of research findings. This helps to maintain research integrity thereby advancing ‘good’ science. According to the results, data sharing minimises or optimises the use of resources in the sense that the use of existing data implies that there is no need to collect fresh data. This is important considering that conducting research is never cheap because it requires resources such as money, personnel, and equipment which are beyond the reach of most researchers, especially those residing in developing countries. Results also showed that sharing empirically generated data has implications for policy formulation. Data can be seen as the raw material for helping to formulate policies which are the engines of national, regional, and world economies. This notion of policy formulation is connected to that of reduced costs in the sense that public and private sectors can access and use existing data to formulate long and short-term polices.
organisational policies. Skills and training in data sharing were highlighted in most publications analysed; perhaps because research stakeholders have channelled substantial resources towards research workshops covering issues such as data analysis using new methods, training in data sharing is uncommon. The possible reason research stakeholders have ignored this important research component is that, data sharing is an emerging concept (Higgins, 2011). The implication is that a large proportion of researchers lack the necessary knowledge and skills or competencies for sharing data. It is not surprising that most studies highlighted the aspect of compensation as detrimental to data sharing because it seems data sharing offers little or no inherent benefits to researchers; they can only be acknowledged or cited by re-users. Although less than half of the studies highlighted the aspect of institutional policies, it is worth acknowledging that these policies are crucial in fostering data sharing at institutional level. 4.8. International level
Five factors that affect data sharing at the individual level were identified. Only five (4.8%) publications mentioned seniority, another five (4.8%) mentioned age, 95 (90.5%) mentioned lack of time, 29 (27.6%) mentioned control over data, and 25 (23.8%) mentioned data misappropriation. One key reason for lack of time on the part of researchers can be attributed to the fact that researchers are unquestionably busy people – they are usually busy with other pressing roles of teaching, supervising students, conducting research, and participating in community service. In addition to their formal teaching assignments, in recent years, researchers are increasingly under persistent pressure to participate in voluntary community service, one of the key aspects considered during their appraisals in universities. Control over data in this context entails a desire by researchers to have a say in the storage, use, and access of the data they share through various means including data repositories. As noted earlier, conducting research is a tedious task and researchers sacrifice time and resources. Therefore, researchers insist on maintaining control over the data they generate. Data misuse, also referred to as misinterpretation, is one of the factors that discourage researchers from sharing their data. Researchers want to be cited or acknowledged by those who use their data. If the data is misinterpreted either deliberately or inadvertently, the integrity or reputation of the primary data generators could be at risk. For this reason, researchers may not be willing to share their data especially in instances where they know they will not have any control over the way data is used. Only a few studies reported on the influence of age and seniority on data sharing; it is clear that younger and early career researchers are more reluctant to share their data than older and seasoned researchers.
Six factors that influenced data sharing at the international level were identified. Only 12 (11.2%) publications mentioned international research polices, 97 (92.4%) mentioned research funding agencies' polices, 95 (90.5%) mentioned publishers' policies, nine (8.6%) mentioned rights management issues, 70 (66.7%) mentioned ethical and legal norms, 69 (65.7%) indicated interoperability issues, and 80 (76.2%) indicated research data infrastructure. It can be concluded from these results that research grant organisations play a big role in fostering data sharing at international level. The role of publishers in data sharing is unsurprising considering that when researchers complete their research projects, their next inexorable step is to publish the findings in reputable journals. Journal publishers have taken a step further by rewarding researchers who share data by publishing the data and developing impact metrics. Results show that data published alongside manuscripts as supplements receive more metrics than data published separately. The possible reason is that reputable journal publishers are already trusted by authors, hence the data they publish is also held in high regard, and in addition, it is much easier to discover such data by re-users. Data infrastructure can be described as the nucleus of all data sharing activities and it is pervasively discussed in the literature. Data infrastructure in this context entails technical infrastructure required to store, manage, maintain, and retrieve data. Data sharing infrastructure for researchers include journal publishers' repositories where datasets are attached to published articles, institutional repositories, local servers, personal computers, free-standing devices, and laboratory repositories. It also includes web-based platforms such as personal websites or blogs, networks and password protected web services such as Evernote, emails, social media, Google Drive, and DropBox. In terms of ethical and legal norms, some data cannot be shared because of ethical complications; in most research studies, researchers enter into contractual consent with study participants in relation to the use of personal data they collect. A few studies discussed the issue of international research policies. The possible reason is that generally, there is an absence of uniform research policies because stakeholders pursue their own policies. In addition, most research policies exist at institutional and national level. Finally, only a few studies mentioned rights management, a bailiwick affecting data sharing. Intellectual property rights and data licensing are identified as two issues under rights management. The challenge is that, to date, there are no clear strategies for protecting data through intellectual property rights.
4.7. Institutional level
5. Discussion
Several issues related to data sharing at institutional level were identified. Ninety-seven (92.4%) publications cited data sharing skills as one of the factors that affected data sharing at institutional level, 89 (84.8%) indicated compensation, and 45 (42.9%) cited internal
5.1. Sources and citations
4.5. Data sharing at individual, institutional, and international levels Attempts were made to find out the factors that either inspired or discouraged data sharing at three levels of the research community, namely individual, institutional, and international levels. 4.6. Individual level
Overall, the results of the study show that most research papers on data sharing are published in subscription databases. The reason could 114
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn
be that researchers are motivated to publish in high impact journals which are usually subscription based. For example, Springer, Wiley Online Library, Sage, ScienceDirect, and Taylor & Francis all publish mostly subscription journals with high impact. It is, therefore, not surprising that peer-reviewed articles dominated the sources because most research findings are published in journals which are the key scholarly communication channels in the 21st century. The results suggest further that while researchers prefer publishing in high impact journals which are commonly subscription based, a larger number of them cite articles published in Open Access journals mainly because there are no access restrictions to such publications. The study exposes one key paradoxical nature of the scholarly publishing industry - while publishing research in subscription journals is mostly free of charge, access to published articles is neither free of charge nor inexpensive. An exception is when researchers pay publishing processing fees to have their publications made Open Access.
fraud. Other benefits which were mentioned but granted little attention by researchers include policy formulation and saving time. 5.4.1. Scientific progress Owing to its potential in fuelling science progress, debates about data sharing are increasingly dominating the international research agenda as demonstrated by the study. Most of the articles analysed in this study pointed to data sharing as enhancing science. The study notes that re-analysis of data generated by other researchers help to arrive at new breakthroughs. Results of the current study are confirmed by a study in Germany which found that data sharing makes “research better”, it is a basis for “feedback and exchange”, provides “consistency in measures across studies to test the robustness of effect”, and enhances “reproducibility of one's own research” (Fecher et al., 2015, p. 14). As already highlighted, the study established that data sharing helps to curb research malpractices. This helps to maintain research integrity thereby advancing ‘good’ science. This is perhaps the reason some journal reviewers demand that in addition to submitting manuscripts, authors should also attach data and its collection instruments for verification and validation. This is important because research malpractice is slowly gaining ground (Gupta, 2013). For instance, out of 742 English language research papers that were retracted by the PubMed database between the period 2000 and 2010, 26% were retracted due to fraud (Steen, 2011). Of course, fraud may come in different forms and be committed for various reasons but whatever the motives may be, any form of research fraud including that of falsifying or fabricating data should be dealt with decisively and stringent mechanisms should be devised to prevent future reoccurrences. Therefore, the study strongly suggests that researchers should make data publicly available so that it can be verified and examined for its accuracy and reproducibility thereby safeguarding against possible misconduct related to data fabrication and falsification or any of form of malpractices. The fact that data sharing contributes to advancement of science explains the reason why this concept has attracted unprecedented interest from many research stakeholders such as research grant organisations, journal publishers, government research agencies, research institutions, and some scientists. It is against this background that the current study suggests that governments, research institutions, universities, and other research stakeholders should enact policies that champion data sharing.
5.2. Overall growth The study established that most sources were published in 2012 and 2013 where the number of publications reached 14 each. The upsurge during this period can be attributed to research grant organisations putting pressure on researchers to share data generated from research projects that they funded. Such organisations include for example the National Science Foundation of the USA (Cohn, 2012), the EU (European Commission, 2012; Fecher et al., 2015), and the National Research Foundation of South Africa (Chiware & Mathe, 2016, p. 2; Koopman, 2015; Matlatse, 2016). The number of publications in 2018 were fewer than those published in the previous year. Since the search for inclusion in this analysis was completed in July 2018, it is possible to speculate that this number had increased by the end of 2018. These results confirm observations made by various authors (Bond-Lamberty, 2018; Bull et al., 2015; Chen & Wu, 2017; Curty et al., 2017; Dai et al., 2018; Dong & Li, 2016; Houtkoop et al., 2018; Kaye et al., 2018; Matlatse, 2016; Peng et al., 2016; Ross, 2016; Schmidt et al., 2016; Schöpfel, Ferrant, André, & Fabre, 2018; Wiley, 2018; Zvyagintseva, 2015) that the emerging research area of data sharing is increasingly attracting interest from researchers across disciplines. 5.3. Research regions The USA contributed the most publications followed by the UK suggesting that these two countries in the English-speaking world are leaders in generating research data, hence the need for studies offering solutions affecting data sharing. South Africa is the only country in Africa which contributed more than one article on data sharing. The reason is that the National Research Foundation of South Africa has helped to strengthen research activities by providing grants to researchers and research agencies and has declared that data generated from research it funds be deposited in Open Access repositories (Chiware & Mathe, 2016, p. 2; Koopman, 2015; Matlatse, 2016). The only non-English speaking country that contributed a considerable number of articles in English is China - two in Taylor & Francis, one in ScienceDirect, one in Springer, and one in PLOS One. This is not surprising because Peng et al. (2016) report that China's position at global level in research jumped from 17 in 1993 to two in 2013. These findings seem to suggest that research on data sharing is more entrenched in developed countries perhaps because these countries have adequate financial capacity and political will to sponsor research which translates into the generation of a large corpus of data which needs to be managed following research evidence practices.
5.4.2. Reduced costs The study established that conducting research is not inexpensive. Thus re-use of data produced and shared by other researchers is financially sensible. Accessing research grants is becoming increasingly difficult (Borgman, 2012; Brown, Bruce, & Kernohan, 2015; Cohn, 2012) hence, using previously generated data becomes one of the key alternatives for researchers. It is therefore understandable that the current study reveals that using existing data is hailed as cost-effective. A study by Yoon (2015) also found that professors, research scientists, and doctoral students from the fields of public health and social work were motivated to use existing data because of cost-effectiveness issues. Although developing a data sharing infrastructure appears to be expensive, researchers have argued that the benefits outweigh the costs incurred (Bond-Lamberty, 2018; Ross, 2016; Shakeri, 2013; Tenopir et al., 2011, p. 1; Yoon, 2015; Zvyagintseva, 2015). Similarly, data sharing, re-use, and collaboration can minimise the cost and redundancy of research data production (Bond-Lamberty, 2018; Dai et al., 2018; Kaye et al., 2018; Ross, 2016; Shakeri, 2013; Tenopir et al., 2011; Yoon, 2015; Zvyagintseva, 2015). The current study has proven that the re-use of data is more cost-effective than collecting fresh data. It is important for research stakeholders to champion and embrace the concept of data sharing.
5.4. Benefits: scientific progress, reduced costs, and reduced research fraud Three key benefits were identified from most publications namely data sharing advances science, minimises costs, and reduces research 115
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn
5.4.3. Policy formulation and time saving Closely related to the benefit of reduced costs is the notion of policy formulation. The study proved that institutions rely on existing data instead of embarking on new research processes which require huge investments in the form of time, money, personnel, and equipment. It is noteworthy that text publications and secondary data can both be used by policy makers to make informed decisions. However, Woolfrey (2009) notes that while the former finds the way into public policy through various means such as publication in journals, there are not very clear means through which the latter can reach policy makers. The study therefore suggests that there is a need for the research community to come up with sound strategies for sharing data. Such strategies could include establishing online repositories in which data can be deposited for re-use purposes. The danger is that if data is inaccessible, researchers will continue losing money and other resources through duplicating data collection.
similar findings. A study by Cragin et al. (2010) found that misuse incidents decreased the degree to which researchers would be willing to share their data. In some studies, researchers have admitted to having misinterpreted data principally because they lacked the skills to read, understand, analyse, and interpret data thereby arriving at incorrect conclusions. When this happens, the owners of such data feel that their data has been misinterpreted or misappropriated (Albert, 2012; Bertzky & Stoll-Kleemann, 2009; Bull et al., 2015; Cooper, 2007; Cragin et al., 2010, p. 4034; Elsayed & Saleh, 2018; Enke et al., 2012; Perrino et al., 2013; Zimmerman, 2008). Fecher et al. (2015, p. 16) identified four typical examples of intentional misuse of data which are dangerous and threaten the public good of data sharing. They include “falsification, commercial misuse, competitive misuse, flawed interpretation, and unclear intent” (Fecher et al., 2015, p. 16). To deal with this problem, three solutions were suggested. Permission to use the data should be requested and granted by owners of data (Fecher et al., 2015). Original generators of data can be consulted to crosscheck and verify the accuracy of the outcomes from the re-analysis of data before making them public. Finally, equipping researchers with ideal skills for data re-use is paramount.
5.5. Data sharing at individual, institutional, and international level 5.5.1. Individual level Three key factors that influence data sharing at the individual level were revealed, namely lack of time, control over data, and data misappropriation. Few studies reported on seniority and age.
5.5.1.4. Seniority and age. Senior and experienced researchers are more willing to share data than early career researchers. Issues of expertise in research may have influenced this result. Seasoned researchers are more experienced in conducting research and are likely to generate quality and reliable data which is likely to be used by other researchers. On the other hand, early career researchers are still learning to research and may generate data of less quality which they might not be willing to share. Because of maturity, seasoned researchers may see data sharing as a moral and professional obligation, meaning they share data even in the absence of incentives. They see more value in sharing data than junior academics (Milia et al., 2012). Similarly, Tenopir et al. (2011) found that age affected data sharing attributing the impact to seniority. To encourage young and early career researchers to share data, it is important to put in place deliberate strategies that compel seasoned researchers to mentor them in research skills so that they produce data which is of high quality for re-use.
5.5.1.1. Lack of time. It must be restated upfront that time was the most dominant factor confronting data sharing efforts at the individual level. Researchers particularly those attached to universities are equally busy with other pressing roles of teaching, supervising students, conducting research, partaking in their institutions' administrative activities, and participating in community service. Similar to the current findings, Tenopir et al. (2011) reported that researchers exclusively involved in research activities are more likely to find time to share their data with colleagues than researchers who are saddled with other timeconsuming responsibilities such as teaching and administrative works. One may argue that data sharing should be perceived as part of the research obligation and lack of time to share data is a poor excuse. However, as already acknowledged data sharing is a rather new concept (Higgins, 2011; Matlatse, 2016) which has been ignored for centuries; emphasis has been on text articles. Therefore, efforts need to be put in place to change the mind-set of researchers about this concept. Awareness raising and incentives are some of the ways that may quickly enhance acceptance of the concept amongst researchers.
5.5.2. Institutional level Every researcher is generally affiliated to a particular institution or organisation such as a university or a research institution. Skills and training, compensation and internal research policies are the key factors that affect data sharing at this level.
5.5.1.2. Control over data. Researchers were interested in sharing their data on condition that they put restrictions over access of the data they shared. Asking for control over the data does not mean researchers are reluctant to share it; they just want to know where their data is stored, who is using their data, the purposes for which it is used, and at least be acknowledged or cited by those re-using the data. Similarly, in Germany, 80% of researchers were of the view that control of shared data was paramount and necessary (Fecher et al., 2015). Researchers may need to have control over their data perhaps because there are some uncertainties surrounding data sharing in mainstream data sharing repositories. One outstanding unresolved issue is intellectual property rights (Cahill & Passamano, 2007; Costello, 2009; Delson, Harcourt-Smith, Frost, & Norris, 2007; Enke et al., 2012; Milia et al., 2012). Since granting researchers more control over data encourages them to share more, the study suggests that formulators of institutional and national research policies need to clearly grant researchers reasonable control over the data they share.
5.5.2.1. Training in data sharing. Knowledge and skills in data sharing are hailed in the literature as necessary in achieving meaningful sharing and data re-use. Some researchers admitted to having misinterpreted other researchers' data due to a lack of skills. Other studies have also reported that lack of adequate skills is a key factor that stymies data sharing at institutional level (Bull et al., 2015; Clement, Blau, Abbaspour, & Gandour-Rood, 2016; Enke et al., 2012; Houtkoop et al., 2018; Specht et al., 2015; Wallis et al., 2013). Lack of skills such as discovering data, discerning datasets for suitable analysis, and determining the quality of data have been reported in the USA (Curty et al., 2017; Schumacher & VandeCreek, 2015) which is the leading country in research. Equipping researchers with data sharing skills is an important approach in minimizing this challenge. Training can be delivered in various ways such as through WeChat, online courses, phone/email, workshops, and library blogs (Chen & Wu, 2017), school curriculum (Piwowar, 2011; Teeters et al., 2008; Volk, Lucero, & Barnas, 2014), and training workshops (Brown et al., 2015; Clement et al., 2016, p. 113). This study suggests that, as data sharing is an emerging notion, universities and research institutions should bear responsibilities of equipping researchers with basic and advanced skills for curating and sharing data. Training may focus on imparting researchers with basic skills such as data formats, documenting
5.5.1.3. Fears of data misuse. Researchers were hesitant to share their data because they feared potential users might misuse it through commercial exploitation; they were further afraid that re-users might misunderstand their data thereby arriving at incorrect conclusions that could endanger their professional integrity. Other studies have reported 116
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn
protocols and data management. Data literacy programs (Koltay, 2017) are also ideal by focusing on activities such as the standard method for citing datasets and encouraging researchers to re-use and cite other scientists. Since librarians are well grounded in delivering information literary programmes, they are equally well placed to deliver data literacy programmes.
data sharing amongst researchers through these policies. The implication is that researchers have no option but to bow to funders' demands. For example, Schmidt et al. (2016) report that most researchers who receive research funding from agencies under the Belmont Forum agreed that funder policies were key in compelling them to share data. The Belmont Forum is used as an example in this context because it is an influential grouping of representatives of research grant organisations which directs global research actions (Schmidt et al., 2016). More authors (Bull et al., 2015; Enke et al., 2012; Fecher et al., 2015, p. 1; Schmidt et al., 2016; Wallis et al., 2013; Wiley, 2018) have independently described research grant organisations as key players in data sharing. However, the study noted that despite not living up to these data sharing policy requirements, some researchers are not reprimanded. The problem is attributed to many irregularities in existing policies. Most research funding organisations are not consistent in enforcing their own policies. For instance, despite enacting a comprehensive and appealing data sharing policy that compels its research fund recipients to make their data publicly accessible, the National Science Foundation of the USA has demonstrated little effort to enforce the policy (Borgman, 2012; Cohn, 2012). The problem is exacerbated by complicated donor policies regarding data sharing which vary from funder to funder and between disciplines. In this spirit, the study proposes that funding agencies need to be strict in ensuring that their own contractual obligations they enter into with their grant recipients are adhered to all the time. In order to monitor that researchers do not exploit loopholes, weak policies should be revised to seal all escape gaps thereby making them more stringent and impactful. More importantly, there is need for donors to develop policies which are easy to understand by researchers; policies which are written in ‘plain language’. This is important because despite considerable interest from donors in data sharing, it remains difficult for researchers to win funding for such an activity. Stringent but clear data sharing policies will surely make it easier for researchers to understand, interpret, and adhere to (Borgman, 2012; Brown et al., 2015; Cohn, 2012; Huang et al., 2012; Huang, Hawkins, & Gexia, 2013; Kaye et al., 2018; Parr, 2007; Pearce & Smith, 2011; Piwowar, 2011; Schmidt et al., 2016; Wiley, 2018).
5.5.2.2. Lack of compensation. Many researchers are unwilling to share their data due to the absence of rewards or incentives for their data sharing activities. Researchers, who have put a lot of effort into generating data, are not promoted for sharing their data. This contrasts with research publications which universities hold in high regard when conducting academic appraisals and recruitment. This shortcoming has been previously highlighted in studies conducted in Germany (Fecher et al., 2015) and the USA (Tenopir et al., 2011). The power of incentives cannot be undervalued if observations by Fecher et al. (2015), Tenopir et al. (2011), and Woolfrey (2009) are to be considered; they all observed that data sharing attracts no tangible benefits such as recognition, and therefore deters some researchers from sharing their data. Because of the status quo, researchers will dedicate more time in preparing and sharing final research findings which bring them recognition (Woolfrey, 2009). Clearly, research institutions have a role to play to motivate researchers in data sharing. They need to start recognising researchers by providing them with funds to carry out data sharing activities during and after completion of their research projects. More importantly, there is a need for research institutions to put in place comprehensive data curation instructions and better procedures for recognising and rewarding those researchers that share their data. 5.5.2.3. Organisational policies. Evidence emerged from the study that policies can either stimulate or prevent researchers from sharing data. Some organisations have either knowingly or unknowingly implemented research policies which are discriminatory in the sense that some policies give preference to particular data types. In other words, policies (Volk et al., 2014; Walters & Skinner, 2011, p. 31) will sometimes dictate which data sets deserve an institution's resources and recognition. Huang et al. (2012) note that institutional cultures, which exclude a reward system in their policies, discourage researchers from sharing data. The current study notes that research institutions' policies are inspired by national policies which can also affect data sharing activities in organisations. This study suggests, therefore, that institutions should only implement policies which demonstrate and promote fairness in all categories of researchers in terms of allocation of resources for carrying out data sharing activities. These policies should clearly spell out how researchers who share their data are incentivised or rewarded. Furthermore, national policies need to clear all bottlenecks choking access to data financed by tax payer's money. Consistent and just polices will stimulate researchers to share data and will turn them into ‘evangelists’ of data sharing hence the need to transform bureaucratic data sharing policies.
5.5.3.2. Publishers' policies. Scholarly journal publishers have taken bold steps towards the realisation of data sharing at international level. Publishers have taken advantage of their indispensability to researchers by subsuming data sharing into the publishing process. Some prestigious journal publishers such as Nature, Science, Elsevier, Atmospheric Chemistry and Physics, F1000Research, and PLOS One, to mention some of the most notable ones, have adopted and enacted polices that compel researchers to submit data as supplements to manuscripts. Some publishers have more stringent policies which push researchers further to share data with peers in their discipline. For example, the International Committee of Medical Journal Editors which is a prominent body of revered publishers of medical journal articles (Ross, 2016) demands that clinical trials published in its member journals should have their data shared with external investigators. This may be one way to propel science in the field of clinical research towards data re-use. Many authors have also suggested that journal policies are of prime importance in strengthening data sharing (Bond-Lamberty, 2018; Fecher et al., 2015, p. 1; Savage & Vickers, 2009; Van Horn & Gazzaniga, 2013). Publishers' polices are working to the advantage of researchers because data deposited in publishers' repositories receive a better citation impact. As a result, some studies have reported that researchers are interested in sharing data for visibility and recognition through citation (Costello, 2009; Enke et al., 2012; Fecher et al., 2015, p. 1; Jeng, He, & Oh, 2016; Parr, 2007; Piwowar, 2011; Specht et al., 2015). This study suggests that publishers need to provide an easily accessible link to data sets for other researchers. Yet, despite their efforts towards solidifying data sharing, publishers' policies are rather weak and differ extensively.
5.5.3. International Issues that dominate data sharing at national and international levels are research funding agencies, publishers' policies, interoperability, and infrastructure. Less dominant issues include international research polices, rights management and ethical and legal norms. 5.5.3.1. Research funding agencies' policies. Evidenced emerged from the literature that these organisations have formulated, adopted, and implemented policies that compel recipients of their funding to make data freely accessible. The impact of these research grant institutions on data sharing is far and wide considering that they operate across national, regional, and international borders reaching out to thousands of researchers. As most researchers cannot do without funds, research grant institutions are taking this opportunity to popularise and promote 117
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn
Wiley (2018) reports that 76% of engineering journals have weak research data sharing policies. Publishers should therefore review their policies and make amendments to clauses which allow researchers to continue violating the policies with impunity. More importantly, formulating consistent and harmonised data sharing policies will be more appropriate in moving towards universal data sharing.
& Schonfeld, 2009; Bull et al., 2015; Cooper, 2007; Enke et al., 2012; Haddow, Bruce, Sathanandam, & Wyatt, 2011; Harding et al., 2013; Kostkova, 2018; Kowalczyk & Shankar, 2011; Levenson, 2010; Mbuagbaw, Foster, Cheng, & Thabane, 2017; Mennes, Biswal, Castellanos, & Milham, 2013; Sheather, 2009; Takashima et al., 2018). The unfortunate but justifiable development is that litigations can be instituted against researchers who innocently share sensitive data for the public good. Obtaining permission or consent from participants at the onset of data collection to have resultant data shared with the public is important. In this scenario, privacy and anonymity issues should be strictly and consistently adhered to by removing all details that may provide clues to identities of participants.
5.5.3.3. Research data infrastructure. According to the findings, data infrastructure includes hardware, software, and storage facilities. Some popular data storage facilities include publishers' repositories, local servers, personal computers, free-standing devices, laboratory repositories, personal websites or blogs, and password protected web services (Evernote, emails, social media, Google Drive and DropBox). However, institutional data repositories are the most appropriate storage facilities. Their popularity can be attributed to specialists such as librarians and Information Technology (IT) personnel managing these institutional repositories. More importantly, since these repositories are managed by qualified staff, it is expected that data stored in these repositories is well described for easy access and retrieval, well-maintained for continued access, and security is inevitably guaranteed through proper back-up systems. However, setting up these repositories requires huge monetary investments, yet most institutions operate on meagre and unsustainable budgets. Institutions need to purchase hardware, software for supporting current and future needs of data sharing, and to recruit personnel with the right mix of skill sets for working with metadata, installing and updating software, troubleshooting the hardware, and offering support services to users of the data repository. Due to a lack of investments, many cases of poor and unreliable hardware, software, and support of the computer environment have been reported (Brown et al., 2015; Coady & Wagner, 2013, p. 1; Fecher et al., 2015; Peng et al., 2016; Permanent Access to the Records in Europe, 2009; Schöpfel et al., 2018; Schumacher & VandeCreek, 2015; Shakeri, 2013; Tenopir et al., 2011; Volk et al., 2014; Wallis et al., 2013, p. 2). The problem of infrastructure is more pronounced in developing countries due to resource constraints. As a result, some rich data produced in developing countries do not adequately benefit these countries. For example, Anane-Sarpong et al. (2017) observe that continents like Africa produce rich data related to high-risk diseases but when such data is shared, it does not scientifically benefit Africans because of limited resources. In other words, such data mainly benefits developed countries where this data is mostly domiciled, preserved, and shared for re-use. African governments need to learn from South Africa which through the National Research Foundation (Chiware & Mathe, 2016; Koopman, 2015; Matlatse, 2016) has created a robust research infrastructure. Consequently, the allocation of adequate funding for setting up robust data infrastructure is imperative for improved data sharing across the globe.
5.5.4. Absence of international research polices The only known international policies are those championed by publishers and research grant organisations. The lack of harmonized international research data frameworks has led to disparities in national policies currently enforced by research institutions. For instance, an international study by Enke et al. (2012) revealed that, while USA scientists complied with a national policy compelling them to share data publicly, contrary policies existed in Canada and Germany. This study proposes that influential research grant organisations should consider reconciling their funding policies with international data sharing policies. 5.5.4.1. Rights management. Two key issues involved in rights management include intellectual property rights and data licensing. Considering that there are no clear strategies for having data copyrighted, researchers are reluctant to share data that they perceive is their intellectual property. Similarly, there is no evidence of the existence of procedures for obtaining or granting data licences. It is for this reason that researchers may not be willing to make their data public as they believe it may be susceptible to exploitation. Although there is limited literature on these themes, some studies have reported that the absence of guidance on rights management destroy data sharing efforts (Shakeri, 2013; Walters & Skinner, 2011, p. 31). Again, some studies have shown that the lack of clear-cut licensing strategies have discouraged researchers from sharing data in Europe (Tenopir et al., 2011, p.2; Walters & Skinner, 2011;) and in Asia (Chen & Wu, 2017). 6. Summary and reflection on the findings Data sharing is an emerging concept which needs to be embraced because of the many benefits it brings to the research community. Data sharing drives scientific progress, minimises research fraud, and reduces research costs. Generally, the study found that regional data sharing imbalances do exist with the USA and UK leading the race. Regions such as Africa lag far behind the USA, Europe, and Asia. Research funding agencies and publishers are key enforcers of data sharing at international level. Their data sharing policies force many researchers to release their research data for public access and re-use. However, current policies by both research grant organisations and publishers are weak because they are not legally binding. Many researchers who violate these policies are not reprimanded. Regardless of the pressure exerted by research stakeholders to make data publicly accessible, there are many factors that frustrate these efforts at individual, institutional, and international levels. The key factors at the individual level include lack of time, control over data, and data misappropriation. At the institutional level, the factors include lack of data sharing skills, lack of compensation, and unfavourable internal policies. At the international level key factors include ethical and legal norms; lack of data infrastructure and interoperability issues. A summary of key themes emerging from the review is presented in Table 2; the summary is divided into two: motivating data sharing factors and challenges.
5.5.3.4. Ethical and legal norms. Ethical and legal obligations limit what researchers can do with the data apart from the core purpose for which consent was sought and granted by the participants. Researchers are therefore restricted from sharing such data as doing so could conflict with their own assurances and this act translates into serious research misconduct. According to Brakewood and Poldrack (2013), Bull et al. (2015), Deng (2016), Fecher et al. (2015), Mbuagbaw, Foster, Cheng, & Thabane, 2017, Kostkova (2018), Sayogo and Pardo (2013), Takashima et al. (2018), and Wittenberg and Elings (2017), respecting contractually informed consent and confidentiality are relevant in most individual related data. Ethical issues become more complicated if data is collected from children and teenagers because, under international law, children and young adults cannot make independent decisions, meaning parents must grant consent. It is unethical to make data public without initial clearance from research participants. Sharing some data could harm people or may cause public uproar if it reaches those who may be aggrieved by such data (Anderson 118
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn
Table 2 Data sharing motivating factors and challenges (n = 105). Themes
f
%
Some selected sources
Motivating factors Scientific progress
104
99
Data sharing policies by funding agencies
97
92.4
Reduces costs
96
91.4
Data sharing policies by publishers
95
90.5
Safeguards against research fraud
69
65.7
Anane-Sarpong et al. (2017), Bond-Lamberty (2018), Coady and Wagner (2013), Curty et al. (2017), Dai et al. (2018), Elsayed and Saleh (2018), Fecher et al. (2015), Fisher and Fortmann (2010), Fry et al. (2008), Kaye et al. (2018), Kostkova (2018), McCullough (2009), Pisani and AbouZahr (2010), Ross (2016), Rowhani-Farid et al. (2017), Takashima et al. (2018) and Watson (2015). Dai et al. (2018), Davenport and Patil (2012), Dong and Li (2016), Bishoff and Johnston (2015), Bond-Lamberty (2018), Bull et al. (2015), Charbonneau (2013), Curty et al. (2017), Dong and Li (2016), Enke et al. (2012), Fecher et al. (2015), Guedon (2015), Houtkoop et al. (2018), Kaye et al. (2018), Matlatse (2016), Mundel (2014), Peng et al. (2016), Pitt and Tang (2013), Ross (2016), Schmidt et al. (2016), Schöpfel et al. (2018), Wallis et al. (2013), Wiley (2018) and Zvyagintseva (2015). Bond-Lamberty (2018), Dai et al. (2018), Kaye et al. (2018), Ross (2016), Shakeri (2013), Tenopir et al. (2011) and Yoon (2015), Zvyagintseva (2015) Bishoff and Johnston (2015), Charbonneau (2013), Chen and Wu (2017), Curty et al. (2017), Davenport and Patil (2012), Dong and Li (2016), Enke et al. (2012), Fecher et al. (2015), Guedon (2015), Matlatse (2016), Mundel (2014), Peng et al. (2016), Schmidt et al. (2016), Wiley (2018) and Wallis et al. (2013). Curty (2015), Deng (2016), Doorn et al. (2013), Fry et al. (2008), Kaye et al. (2018), Ross (2016), Rowhani-Farid et al. (2017), Schmidt et al. (2016), Shakeri (2013), Takashima et al. (2018), Tenopir et al. (2011) and Wallis et al. (2013).
Challenges Data sharing skills
97
92.4
89 80
84.8 76.2
70
66.7
Interoperability
69
65.7
Internal organisational polices
45
42.9
Lack of Compensations infrastructure Legal and ethical issues
Anane-Sarpong et al. (2017), Clement et al. (2016), Curty et al. (2017), Deng (2016), Houtkoop et al. (2018), Enke et al. (2012), Schmidt et al. (2016), Fecher et al. (2015), Van Horn and Gazzaniga (2013), Volk et al. (2014) and Wallis et al. (2013). Burgi et al. (2016), Anane-Sarpong et al. (2017), Jeng et al. (2016) and Peng et al. (2016). Anane-Sarpong et al. (2017), Brown et al. (2015), Fecher et al. (2015), Peng et al. (2016), Schmidt et al. (2016), Schöpfel et al. (2018), Shakeri (2013), Schöpfel et al. (2018), Tenopir et al. (2011) and Volk et al. (2014). Anderson and Schonfeld (2009), Brakewood and Poldrack (2013), Bull et al. (2015), Chen and Wu (2017), Cooper (2007), Fecher et al. (2015), Kostkova (2018), Levenson (2010), Mbuagbaw et al. (2017), Mennes et al. (2013), Ross (2016), Schmidt et al. (2016), Takashima et al. (2018), Tenopir et al. (2011), Walters and Skinner (2011), Wittenberg and Elings (2017), Bezuidenhout (2013), Costello (2009), Curty et al. (2017), Enke et al. (2012), Kaye et al. (2018), Nelson (2009), Plengsaeng, Wehn and van der Zaag (2014), Sansone and Rocca-Serra (2012), Schmidt et al. (2016), Schöpfel et al. (2018), Scott (2014), Teeters et al. (2008), Tenopir et al. (2011), Volk et al. (2014), Yoon (2015), Yoon and Schultz (2017), He (2016) and Woolfrey (2009). Bond-Lamberty (2018), Bull et al. (2015), Chandramohan et al. (2008), Huang et al. (2012), Kaye et al. (2018), Ostell (2009), Rohlfing and Poline (2012), Tucker (2009), Volk et al. (2014) and Walters and Skinner (2011).
6.1. Personal reflection on the findings
7. Conclusion
Despite resolute efforts to promote data sharing, relatively little data is shared. Data sharing will only become a success if researchers put equal effort into managing data and preparing a manuscript for publication. To move towards achieving this, proper interventions to help address several issues need to be taken by research stakeholders. There is a need to recognise researchers who share data by citing and acknowledging their data. Only when researchers are incentivized in the same way they are rewarded for sharing their research findings will data sharing thrive. This will require putting in place proper rewards strategies. Universities, research institutions, journal publishers, and research funding agencies need to collaborate in popularising the concept of free data sharing amongst researchers through advocacy programs. Advocacy programs should focus on clarifying the benefits of data sharing and reuse thereby influencing researchers' view of data sharing as ‘good behaviour in science’. Advocacy activities should include training in data sharing. There is a need to make resources available for building a robust data infrastructure at all levels of the research hierarchy. Building, developing, and acquiring new applications and standards that promote interoperability that can be incorporated across institutions and discipline divides is fundamental. Of course, data sharing may not bear instant benefits to individuals, institutions, or governments. Benefits may take time but that should not be an excuse for not investing in data sharing. Solutions to world problems should not only be short-term; current researchers should lay a good foundation for future generations by preserving data that is necessary for solving future complex problems. Data sharing polices are inconsistent and weak; more stringent policies are required. More importantly, the formulation of policies that favour all types of research data are necessary; there should be no room for polices that favour certain types of data over the other.
This study is a comprehensive review on diverse issues regarding data sharing focusing at three levels of the research hierarchy - individual, institutional, and international. The study synthesises and reports key challenges affecting data sharing, by reviewing the literature from various parts of the world. Apart from revealing the factors affecting data sharing, the study proposes solutions to key challenges. Research stakeholders such as research grant institutions, government agencies, publishers, librarians, and scientists could use the solutions that have been suggested as a starting point for improving or implementing data sharing. Factors affecting data sharing at the three research hierarchy levels are unique but also interwoven, allowing policy makers and research stakeholders to adopt different approaches when implementing and promoting data sharing at each level. Finally, this research sets the tone for further discussion regarding data sharing and it could provoke salient questions that may trigger further research in this emerging discipline. 8. Future research This systematic review focused on data sharing, in general. With a steady rise in research in data sharing as evidenced by this review, future reviews should focus on reviewing literature in particular disciplines or specializations such as life sciences, tourism, and humanities, to mention a few. Moreover, despite common knowledge that libraries are key partners in data sharing, the study did not adequately extend its scope to the role of libraries in data sharing activities. Future studies should review existing literature with the aim of cataloguing the global trends, contributions, and challenges preventing efforts by libraries in data sharing activities. 119
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn
References
date: 14 May 2018. Curty, R. G., Crowston, K., Specht, A., Grant, B. W., & Dalton, E. D. (2017). Attitudes and norms affecting scientists' data reuse. PLoS ONE, 12(12), e0189288. https://doi.org/ 10.1371/journal.pone.0189288. Dai, S. Q., Li, H., Xiong, J., Ma, J., Guo, H. Q., Xiao, X., & Zhao, B. (2018). Assessing the extent and impact of online data sharing in eddy covariance flux research. Journal of Geophysical Research – Biogeosciences, 123(1), 129–137. https://doi.org/10.1002/ 2017JG004277. Davenport, T. H., & Patil, D. J. (2012). Data scientist. Harvard Business Review, 90(5), 70–76. Delson, E., Harcourt-Smith, W. E. H., Frost, S. R., & Norris, C. A. (2007). Databases, data access, and data sharing in paleoanthropology: First steps. Evolutionary Anthropology, 16, 161–163. Deng, X. (2016). Urgent need for a data sharing platform to promote ecological research in China. Ecosystem Health and Sustainability, 2(9), https://doi.org/10.1002/ehs2. 1241 e01241. Dong, R., & Li, S. (2016). Let scientific data sharing become the new normal for Chinese ecologists. Ecosystem Health and Sustainability, 2(5), https://doi.org/10.1002/ehs2. 1218 e01218. Doorn, P., Dillo, I., & Van Horik, R. (2013). Lies, damned lies and research data: Can data sharing prevent data fraud? International Journal of Digital Curation, 8, 229–243. https://doi.org/10.2218/ijdc.v8i1.256. Dubé, L., & Paré, G. (2003). Rigor in information systems positivist case research: Current practices, trends, and recommendations. MIS Quarterly, 597–636. Elsayed, A. M., & Saleh, E. I. (2018). Research data management and sharing among researchers in Arab universities: An exploratory study. IFLA Journal, 44(4), 281–299. https://doi.org/10.1177/0340035218785196. Enke, N., Thessen, A., Bach, K., Bendix, J., Seeger, B., & Gemeinholzer, B. (2012). The user's view on biodiversity data sharing - investigating facts of acceptance and requirements to realize a sustainable use of research data. Ecological Informatics, 11, 25–33. https://doi.org/10.1016/j.ecoinf.2012.03.004. European Commission (2012). Scientific data: Open access to research results will boost Europe's innovation capacity. Retrieved from http://europa.eu/rapid/press-release_ MEMO-12-565_en.htm?locale=en. Fecher, B., Friesike, S., & Hebing, M. (2015). What drives academic data sharing? PLoS ONE, 10(2), e0118053. https://doi.org/10.1371/journal.pone.0118053. Fienberg, S. E., Martin, M. E., & Straf, M. L. (1985). Sharing research data. Washington, D.C.: National Academy Press. Fisher, J. B., & Fortmann, L. (2010). Governing the data commons: Policy, practice, and the advancement of science. Information & management, 47(4), 237–245. https://doi. org/10.1016/j.im.2010.04.001. Friedlander, A., & Adler, P. (2006). To stand the test of time: Long-term stewardship of digital data sets in science and engineering. Arlington, Virginia: Association of Research Libraries. Fry, J., Lockers, S., Oppenheim, C., Houghton, J., & Rasmussen, B. (2008). Identifying benefits arising from the curation and open sharing of research data produced by UK higher education and research institutes. UK: Loughborough University/Centre for Centre for Strategic Economic Studies. Glaeser, P. S. (1990). Scientific and technical data in a new era. New York, NY: Hemisphere Publishing Corporation. Guedon, J. C. (2015). Open data and science: Towards optimizing the research process. Retrieved from https://www.dataone.org/webinars/open-data-and-science-towardsoptimizing-research-process. Gupta, A. (2013). Fraud and misconduct in clinical research: A concern. Perspectives in Clinical Research, 4(2), 144–147. https://doi.org/10.4103/2229-3485.111800. Haddow, G., Bruce, A., Sathanandam, S., & Wyatt, J. C. (2011). ‘Nothing is really safe’: A focus group study on the processes of anonymizing and sharing of health data for research purposes. Journal of Evaluation in Clinical Practice, 17(6), 1140–1146. https://doi.org/10.1111/j.1365-2753.2010.01488.x. Harding, A., Harper, B., Stone, D., O'Neill, C., Berger, P., Harris, S., & Donatuto, J. (2013). Conducting research with tribal communities: Sovereignty, ethics, and data-sharing issues. Environmental Health Perspectives, 120(1), 6–10. https://doi.org/10.1289/ehp. 1103904. He, Y. (2016). Data sharing across research and public communities. Doctoral thesis)College Park, MD: University of Maryland. https://doi.org/10.13016/M2Z84M. Heidorn, P. B. (2008). Shedding light on the dark data in the long tail of science. Library Trends, 57(2), 280–299. https://doi.org/10.1353/lib.0.003. Heidorn, P. B. (2011). The emerging role of libraries in data curation and E-science. Journal of Library Administration, 51, 662–672. https://doi.org/10.1080/01930826. 2011.601269. Higgins, S. (2011). Digital curation: The emergence of a new discipline. The International Journal of Digital Curation, 2(6), 78–88. https://doi.org/10.2218/ijdc.v6i2.191. Housewright, R., Schonfeld, R. C., & Wulfson, K. (2013). Ithaka S+ R US faculty survey 2012. New York: Ithaka S+ R. Houtkoop, B. L., Chambers, C., Macleod, M., Bishop, D. V., Nichols, T. E., & Wagenmakers, E. J. (2018). Data sharing in psychology: A survey on barriers and preconditions. Advances in Methods and Practices in Psychological Science, 1(1), 70–85. Huang, X., Hawkins, B. A., Lei, F., Miller, G. L., Favret, C., Zhang, R., & Qiao, G. (2012). Willing or unwilling to share primary biodiversity data: Results and implications of an international survey. Conservation Letters, 5(5), 399–406. https://doi.org/10. 1111/j.1755-263X.2012.00259.x. Huang, X., Hawkins, B. A., & Gexia, Q. (2013). Biodiversity data sharing: Will peer- reviewed data papers work? BioScience, 63, 5–6. Jeng, W., He, D., & Oh, J. S. (2016). Toward a conceptual framework for data sharing practices in social sciences: A profile approach. Proceedings of the 79th ASIS&T Annual Meeting: Creating knowledge, enhancing lives through Information & Technology.
Ahmed, Y. A., Ahmad, M. N., Ahmad, N., & Zakaria, N. H. (2018). Social media for knowledge-sharing: A systematic literature review. Telematics and Informatics. https://doi.org/10.1016/j.tele.2018.01.015. Albert, H. (2012). Scientists encourage genetic data sharing. Springer Healthcare News, 1, 1–2. https://doi.org/10.1007/s40014-012-1434-z. Anane-Sarpong, E., Wangmo, T., Ward, C. L., Sankoh, O., Tanner, M., & Elger, B. S. (2017). “You cannot collect data using your own resources and put it on open access”: Perspectives from Africa about public health data-sharing. Developing World Bioethics, 1–12. https://doi.org/10.10.1111/dewb.12159. Anderson, J. R., & Schonfeld, T. L. (2009). Data-sharing dilemmas: Allowing pharmaceutical company access to research data. IRB: Ethics & Human Research, 31(3), 17–19. https://doi.org/10.2307/25594876. Bertzky, M., & Stoll-Kleemann, S. (2009). Multi-level discrepancies with sharing data on protected areas: What we have and what we need for the global village. Journal of Environmental Management, 90(1), 8–24. Bezuidenhout, L. (2013). Data sharing and dual-use issues. Science and Engineering Ethics, 19(1), 83–92. https://doi.org/10.1007/s11948-011-9298-7. Bishoff, C., & Johnston, L. (2015). Approaches to data sharing: An analysis of NSF Data Management plans from a large research university. Journal of Librarianship & Scholarly Communication, 3(2), eP1231. https://doi.org/10.7710/2162-3309.1231. Bond-Lamberty, B. (2018). Data sharing and scientific impact in eddy covariance research. Journal of Geophysical Research – Biogeosciences, 123(4), 1440–1443. https:// doi.org/10.1002/2018JG004502. Borgman, C. L. (2012). The conundrum of sharing research data. Journal of the Association for Information Science and Technology, 63(6), 1059–1078. https://doi.org/10.1002/ asi.22634. Brakewood, B., & Poldrack, R. A. (2013). The ethics of secondary data analysis: Considering the application of Belmont principles to the sharing of neuroimaging data. NeuroImage, 82, 671–676. https://doi.org/10.1016/j.neuroimage.2013.02.040. Brettle, A. (2009). Reviews and evidence based library and information practice. Evidence Based Library and Information Practice, 4(1), 43–50. https://doi.org/10.18438/ B8N613. Brown, S., Bruce, R., & Kernohan, D.. Directions for research data management in UK universities. (2015). Retrieved from http://repository.jisc.ac.uk/5951/4/JR0034_RDM_ report_200315_v5.pdf (Accessed October 5, 2017). Bull, S., Cheah, P. Y., Denny, S., Jao, I., Marsh, V., Merson, L., ... Wassenaar, D. (2015). Best practices for ethical sharing of individual-level health research data from lowand middle-income settings. Journal of Empirical Research on Human Research Ethics, 10(3), 302–313. https://doi.org/10.1177/1556264615594606. Burgi, P. Y., Blumer, E., & Makhlouf-Shabou, B. (2017). Research data management in Switzerland: National efforts to guarantee the sustainability of research outputs. IFLA Journal, 43(1), 5–21. https://doi.org/10.1177/0340035216678238. Cahill, J. M., & Passamano, J. A. (2007). Full disclosure matters. Near Eastern Archaeology, 70(4), 194–196. https://doi.org/10.2307/20361332. Chandramohan, D., Shibuya, K., Setel, P., Cairncross, S., Lopez, A. D., Murray, C. J., & Binka, F. (2008). Should data from demographic surveillance systems be made more widely available to researchers? PLoS Medicine, 5(2), e57. https://doi.org/10. 1371/journal.pmed.0050057. Charbonneau, D. H. (2013). Strategies for data management engagement. Medical Reference Services Quarterly, 32(3), 365–374. https://doi.org/10.1080/02763869. 2013.807089. Chen, X., & Wu, M. (2017). Survey on the needs for chemistry research data management and sharing. The Journal of Academic Librarianship, 43(4), 346–353. https://doi.org/ 10.1016/j.acalib.2017.06.006. Chigwada, J., Chiparausha, B., & Kasiroori, J. (2017). Research data management in research institutions in Zimbabwe. Data Science Journal, 16(31), 1–9. https://doi.org/ 10.5334/dsj-2017-031. Chiware, E., & Mathe, Z. (2016). Academic libraries' role in research data management services: A south African perspective. South African Journal of Libraries and Information Science, 81(2), 1–10. https://doi.org/10.7553/81-2-1563. Clement, R., Blau, A., Abbaspour, P., & Gandour-Rood, E. (2016). Team-based data management instruction at small liberal arts colleges. IFLA Journal, 43(1), 105–118. https://doi.org/10.1177/0340035216678239. Coady, S. A., & Wagner, E. (2013). Sharing individual level data from observational studies and clinical trials: A perspective from NHLBI. Trials, 14(1), 201. https://doi. org/10.1186/1745-6215-14-201. Cohn, J. P. (2012). Dataone opens doors to scientists across disciplines. BioScience, 62(11), 1004. https://doi.org/10.1525/bio.2012.62.11.16. Cook, D. J., Mulrow, C. D., & Haynes, R. B. (1997). Systematic reviews: Synthesis of best evidence for clinical decisions. Annals of Internal Medicine, 126(5), 376–380. https:// doi.org/10.7326/0003-4819-126-5-199703010-00006. Cooper, M. (2007). Sharing data and results in ethnographic research: Why this should not be an ethical imperative. Journal of Empirical Research on Human Research Ethics, 2(1), 3–19. https://doi.org/10.1525/jer.2007.2.1.3. Corti, L., Van den Eynden, V., Bishop, L., & Woollard, M. (2014). Managing and sharing research data: A guide to good practice. London: Sage. Costello, M. J. (2009). Motivating online publication of data. BioScience, 59, 418–427. Cragin, M. H., Palmer, C. L., Carlson, J. R., & Witt, M. (2010). Data sharing, small science and institutional repositories. Philosophical Transactions of the Royal Society, 368(1926), 4023–4038. Curty, R. G. (2015). Beyond “data thrifting”: An investigation of factors influencing research data reuse in the social sciences (unpublished doctoral dissertation). USA: Syracuse University. Retrieved from https://surface.syr.edu/etd/266/, Accessed
120
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn American Society for Information Science. Kambatla, K., Kollias, G., Kumar, V., & Grama, A. (2014). Trends in big data analytics. Journal of Parallel and Distributed Computing, 74(7), 2561–2573. Kaye, J., Terry, S. F., Juengst, E., Coy, S., Harris, J. R., Chalmers, D., ... Bezuidenhout, L. (2018). Including all voices in international data-sharing governance. Human Genomics, 12(1), 13. https://doi.org/10.1186/s40246-018-0143-9. Kitchenham, B. (2007). Guidelines for performing systematic literature reviews in software engineering. Durham, UK: Evidence-Based Software Engineering. Koltay, T. (2017). Data literacy for researchers and data librarians. Journal of Librarianship and Information Science, 49(1), 3–14. https://doi.org/10.1177/0961000615616450. Koopman, M. M. (2015). Data archiving, management initiatives and expertise in the Biological Sciences Department (Master's thesis). Cape Town, South Africa: University of Cape Town. Retrieved from http://open.uct.ac.za/handle/11427/13656 (Accessed June 20, 2017) . Kostkova, P. (2018). Disease surveillance data sharing for public health: The next ethical frontiers. Life Sciences, Society and Policy, 14(1), 16. https://doi.org/10.1186/s40504018-0078-x. Kowalczyk, S., & Shankar, K. (2011). Data sharing in the sciences. Annual Review of Information Science and Technology, 45, 247–294. https://doi.org/10.1002/aris.2011. 1440450113. Levenson, D. (2010). When should pediatric biobanks share data? American Journal of Medical Genetics Part A. https://doi.org/10.1002/ajmg.a.33287. Matlatse, R. L. (2016). An evaluation of a structured training event aimed at enhancing the research data management (RDM) knowledge and skills of library and information science (LIS) professionals in South African higher education institutions (HEIs) (Unpublished master's thesis). South Africa: University of Pretoria. Matteson, M. L., Salamon, J., & Brewster, L. (2011). A systematic review of research on live chat service. Reference & User Services Quarterly, 51(2), 109–172. https://doi.org/ 10.5860/rusq.51n2.172. Mbuagbaw, L., Foster, G., Cheng, J., & Thabane, L. (2017). Challenges to complete and useful data sharing. Trials, 18(1), 71. https://doi.org/10.1186/s13063-017-1816-8. McCullough, B. D. (2009). Open access economics journals and the market for reproducible economic research. Econ Anal Policy, 39, 118–126. McKibbon, A. (2006). Systematic reviews and librarians. Library Trends, 55(1), 202–215. https://doi.org/10.1353/lib.2006.0049. Mennes, M., Biswal, B. B., Castellanos, F. X., & Milham, M. P. (2013). Making data sharing work: The FCP/INDIexperience. NeuroImage, 82, 683–691. https://doi.org/10.1016/ j.neuroimage.2012.10.064. Milia, N., Congiu, A., Anagnostou, P., Montinaro, F., Capocasa, M., Sanna, E., & Bisol, G. D. (2012). Mine, yours, ours? Sharing data on human genetic variation. PLoS ONE, 7(6), https://doi.org/10.1371/journal.pone.0037552 e37552. Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Annals of Internal Medicine, 151(4), 264–269. https://doi.org/10.1136/bmj.b2535. Mundel, T. (2014). Knowledge is power: Sharing information can accelerate global health impact. Retrieved from https://www.impatientoptimists.org/Posts/2014/11/ Knowledge-isPower#.WizthFWWbIU. Nelson, B. (2009). Data sharing: Empty archives. Nature, 461, 160–163. https://doi.org/ 10.1038/461160a. Ng'eno, E. J. (2018). Research data management in Kenya's agricultural research institutes (unpublished D. Phil. Thesis). KwaZulu-Natal, South Africa: University of KwaZuluNatal. Okoli, C., & Schabram, K. (2010). A guide to conducting a systematic literature review of information systems research. Sprouts: Working Papers on Information Systems, 10(26), Retrieved from https://pdfs.semanticscholar.org/31dc/ 753345d5230e421ea817dd7dcdd352e87ea2.pdf (Accessed March 27, 2019) . Ostell, J. (2009). Data sharing: Standards for bioinformatic cross-talk. Hum Mutat, 30, vii– vii https://doi.org/10.1002/humu.21013. Parr, C. S. (2007). Open sourcing ecological data. BioScience, 57, 309. https://doi.org/10. 1641/B570402. Pearce, N., & Smith, A. H. (2011). Data sharing: Not as simple as it seems. Environmental Health, 10(1), 107. https://doi.org/10.1186/1476-069X-10-107. Peng, C., Song, X., Jiang, H., Zhu, Q., Chen, H., Chen, J. M., ... Zhou, X. (2016). Towards a paradigm for open and free sharing of scientific data on global change science in China. Ecosystem Health and Sustainability, 2(5), https://doi.org/10.1002/ehs2.1225 e01225. Permanent Access to the Records in Europe. First insights into digital preservation of research output in Europe. (2009). http://libereurope.eu/wp-content/uploads/PARSE-Insight_ D3-5InterimInsightReport_final.pdf (Accessed October 9, 2018). Perrino, T., Howe, G., Sperling, A., Beardslee, W., Sandler, I., ... Cruden, G. (2013). Advancing science through collaborative data sharing and synthesis. Perspectives on Psychological Science, 8(4), 433–444. Pisani, E., & AbouZahr, C. (2010). Sharing health data: Good intentions are not enough. Bulletin of the World Health Organization, 88(6), 462–466. https://doi.org/10.2471/ BLT.09.074393. Pitt, M. A., & Tang, Y. (2013). What should be the data sharing policy of cognitive science? Topics in Cognitive Science, 5(1), 214–221. https://doi.org/10.1111/tops. 12006. Piwowar, H. A. (2011). Who shares? Who doesn't? Factors associated with openly archiving raw research data. PLoS ONE, 6(7), https://doi.org/10.1371/journal.pone. 0018657 e18657. Plengsaeng, B., Wehn, U., & van der Zaag, P. (2014). Data-sharing bottlenecks in transboundary integrated water resources management: A case study of the Mekong River Commission’s procedures for data sharing in the Thai context. Water International, 39(7), 933–951. https://doi.org/10.1080/02508060.2015.981783. Pryor, G. (2012). Managing research data. London: Facet Publishing.
Rohlfing, T., & Poline, J. B. (2012). Why shared data should not be acknowledged on the author byline. NeuroImage, 59, 4189–4195. https://doi.org/10.1016/j.neuroimage. 2011.09.080. Ross, J. S. (2016). Clinical research data sharing: What an open science world means for researchers involved in evidence synthesis. Systematic Reviews, 5(1), 159. https://doi. org/10.1186/s13643-016-0334-1. Rowhani-Farid, A., & Barnett, A. G. (2016). Has open data arrived at the British medical journal (BMJ)? An observational study. BMJ Open, 6(10), https://doi.org/10.1136/ bmjopen-2016-011784 e011784. Rowhani-Farid, A., Allen, M., & Barnett, A. G. (2017). What incentives increase data sharing in health and medical research? A systematic review. Research Integrity and Peer Review, 2(1), 4. https://doi.org/10.1186/s41073-017-0028-9. Royal Society (2012). Science as an open enterprise: Open data for open science. London: The Royal Society Science Policy Centre. Sansone, S. A., & Rocca-Serra, P. (2012). On the evolving portfolio of community-standards and data sharing policies: Turning challenges into new opportunities. GigaScience, 1, 1–3. https://doi.org/10.1186/2047-217X-1-10. Savage, C. J., & Vickers, A. J. (2009). Empirical study of data sharing by authors publishing in PLOS journals. PLoS ONE, 4, e7078. https://doi.org/10.1371/journal.pone.0007078. Sayogo, D. S., & Pardo, T. A. (2013). Exploring the determinants of scientific data sharing: Understanding the motivation to publish research data. Government Information Quarterly, 30, S19–S31. https://doi.org/10.1016/j.giq.2012.06.011. Schmidt, B., Gemeinholzer, B., & Treloar, A. (2016). Open data in global environmental research: The Belmont Forum's open data survey. PLoS ONE, 11(1), e0146695. https://doi.org/10.1371/journal.pone.0146695. Schöpfel, J., Ferrant, C., André, F., & Fabre, R. (2018). Research data management in the French National Research Center (CNRS). Data Technologies and Applications, 52(2), 248–265. https://doi.org/10.1108/DTA-01-2017-0005. Schumacher, J., & VandeCreek, D. (2015). Intellectual capital at risk: Data management practices and data loss by faculty members at five American universities. International Journal of Digital Curation, 10(2), 96–109. Scott, M. (2014). Research data management (Doctoral dissertation)United Kingdom: University of Southamptonhttps://doi.org/10.5258/SOTON/374711. Shakeri, S. (2013). Data curation perspectives and practices of researchers at Kent State University's Liquid Crystal Institute: A case study (Doctoral dissertation). USA: Kent State University. Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num= kent1385382943 (Accessed September 2, 2017) . Sheather, J. (2009). Confidentiality and sharing health information. BMJ, 338, b2160. https://doi.org/10.1136/bmj.b2160. Specht, A., Guru, S., Houghton, L., Keniger, L., Driver, P., Ritchie, E. G., ... Treloar, A. (2015). Data management challenges in analysis and synthesis in the ecosystem sciences. Science of the Total Environment, 534, 144–158. Steen, R. G. (2011). Retractions in the scientific literature: Is the incidence of research fraud increasing? Journal of Medical Ethics, 37(4), 249–253. https://doi.org/10.1136/ jme.2010.040923. Takashima, K., Maru, Y., Mori, S., Mano, H., Noda, T., & Muto, K. (2018). Ethical concerns on sharing genomic data including patients' family members. BMC Medical Ethics, 19(1), 61. https://doi.org/10.1186/s12910-018-0310-5. Teeters, J. L., Harris, K. D., Millman, K. J., Olshausen, B. A., & Sommer, F. T. (2008). Data sharing for computational neuroscience. Neuroinformatics, 6(1), 47–55. https://doi. org/10.1007/s12021-008-9009-y. Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., ... Frame, M. (2011). Data sharing by scientists: Practices and perceptions. PLoS ONE, 6(6), https:// doi.org/10.1371/journal.pone.0021101 e21101. Tucker, J. (2009). Motivating subjects: Data sharing in cancer research (unpublished doctoral dissertation). Virginia Polytechnic Institute and State University. VA: Blacksburg. Ullah, A., & Ameen, K. (2018). Account of methodologies and methods applied in LIS research: A systematic review. Library & Information Science Research, 40(1), 53–60. https://doi.org/10.1016/j.lisr.2018.03.002. Van Horn, J. D., & Gazzaniga, M. S. (2013). Why share data? Lessons learned from the fMRIDC. NeuroImage, 82, 677–682. https://doi.org/10.1016/j.neuroimage.2012.11. 010. Volk, C. J., Lucero, Y., & Barnas, K. (2014). Why is data sharing in collaborative natural resource efforts so hard and what can we do to improve it? Environmental Management, 53(5), 883–893. https://doi.org/10.1007/s00267-014-0258-2. Wallis, J. C., Rolando, E., & Borgman, C. L. (2013). If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology. PLoS ONE, 8, e67332. https://doi.org/10.1371/journal.pone.0067332. Walters, T., & Skinner, K. (2011). New roles for new times: Digital curation for preservation. Washington, DC: Association of Research Libraries. Watson, M. (2015). When will ‘open science’ become simply ‘science’?. Genome Biology, 16(1), 101. https://doi.org/10.1186/s13059-015-0669-2. Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish your data too? Intelligence, 40, 73–76. Wiley, C. (2018). Data sharing and engineering faculty: An analysis of selected publications. Science & Technology Libraries, 37(4), 409–419. https://doi.org/10.1080/ 0194262X.2018.1516596. Wittenberg, J., & Elings, M. (2017). Building a research data management service at the University of California, Berkeley: A tale of collaboration. IFLA Journal, 43(1), 89–97. https://doi.org/10.1177/0340035216686982. Woolfrey, L. (2009). Archiving social survey data in Africa: An overview of African microdata curation and the role of survey data archives in data management in Africa (Unpublished doctoral dissertation). South Africa: University of Cape Town. Yoon, A. (2015). Data reuse and users' trust judgments: Toward trusted data curation (Unpublished doctoral dissertation). Chapel Hill, NC: The University of North Carolina at Chapel Hill. Retrieved from https://cdr.lib.unc.edu/record/uuid:2c2268b3-88cf-
121
Library and Information Science Research 41 (2019) 109–122
W.D. Chawinga and S. Zinn
Africa. His research interests cross-cut scholarly communication, ICT pedagogy and digital curation. Winner has published 20 journal articles some of which have appeared in journals indexed in Scopus namely, Information Development, Business Information Review, International Journal of Educational Technology in Higher Education, Research in Comparative and International Education, E-Learning and Digital Media, International Review of Research in Open and Distributed Learning, and International Journal for Educational Integrity.
4397-b038-b39e88f80d83. Yoon, A., & Schultz, T. (2017). Research data management services in academic libraries in the US: A content analysis of libraries’ websites. College & Research Libraries, 78, 920–933. https://doi.org/10.5860/crl.78.7.920. Zimmerman, A. S. (2008). New knowledge from old data: The role of standards in the sharing and reuse of ecological data. Science, Technology & Human Values, 33, 631–652. https://doi.org/10.2307/29734058. Zvyagintseva, L. (2015). Articulating a vision for community-engaged data curation in the digital humanities (Doctoral thesis)Canada: University of Albertahttps://doi.org/10. 7939/R3M66S.
Sandy Zinn is an Associate Professor and Chairperson of the Department of Library and Information Science, University of the Western Cape, South Africa. She holds a PhD in information studies from the University of KwaZulu-Natal, South Africa. Her research interests are wide and varied and include curriculum connections with the school library, e-learning, information behaviour, social media in libraries, and scholarly communication. She has authored several publications including journal articles, conference proceedings and book chapters and she serves on a variety of editorial boards of international journals and conference proceedings.
Winner Dominic Chawinga is a doctoral student in the Department of Library and Information Science at the University of the Western Cape, South Africa. A winner of the 2015 IFLA Library and Information Science Student Paper Award, Winner holds a master's degree in library and information studies from the University of the Western Cape, South
122