Researchers have shared data with their peers for centuries. Sharing data allows verification of results, compilations of data into larger synthesis studies, reinterpretation of existing data in the light of new hypotheses, and many more uses to advance science. In the past data were published as part of the original publication, primarily in the form of data tables. Over time the size of the data sets used in a scientific publication grew, often prohibiting their publication as printed data tables. Also, not all data can be represented in tabular form. Journal publishers started to cite page limits as a reason to exclude data tables from publications. As a result, data used as the basis of a publication are rarely published anymore. This development created a structural barrier to the publication of data (Klump et al. 2006).
The emergence of the internet made the sharing of data potentially much easier and gave rise to the expectation that this will change the way in which we conduct research. While for some communities the internet made the sharing of data possible at scales never imagined, the overall effect on sharing of data was rather small. And while people started to develop a culture of sharing content on the internet through social media, wikis, peer-to-peer networks and other media, cultural attitudes towards the sharing of data in research did not change much.
It had been proposed that cultural change will happen through generational change (British Library, HEFCE, and JISC 2012). Perhaps the ‘digital natives’ of ‘Generation Y’ will be more open to sharing their data? A survey in the United Kingdom asked over 17,000 doctoral students about their attitudes and behaviours, including the sharing of their research data (British Library, HEFCE, and JISC 2012). The survey showed that future researchers behaved in their private lives as would be expected from other members of their cohort, being quite open to the idea of sharing resources. However, when it came to their behaviours in their academic environment they shared the views and followed the behaviours of their supervisors, whom they regarded as successful role models. Other researchers dispute the idea of “digital natives” altogether, arguing that media have changed rather than their users’ general practices and dispositions (Selwyn 2009). More recent studies (Van den Eynden et al. 2016; Tenopir et al. 2015) even showed that early career researchers are less likely to share data due to their fear of losing future publication opportunities by making their data available.
Even though there was no observable difference between generations of researchers in their willingness to share data, profound differences can be seen between different research communities. Some communities such as high energy physics, astronomy, genomics, climate research and parts of the geosciences have built infrastructures that successfully enable sharing of research data at large scales. What distinguishes these communities from others? This paper sets out to analyse and discuss the social drivers that influence data sharing in science and to suggest possible points of intervention to make data sharing a mutually beneficial practice for both data producers and data users.
Open Access to Data
The technical barriers to the access to knowledge were minimised through the potential of online access over the internet but reality fell behind expectations. This discrepancy between expectations for broader access to knowledge and the barriers still encountered led to the formation of initiatives to promote access to knowledge, culminating in the ‘Berlin Declaration for Access to Knowledge in the Sciences and Humanities’ (Berlin Declaration 2003) which has meanwhile been signed by 572 institutions. In this declaration, the signatories call for open access not only to scholarly literature but also to research data. The Berlin Declaration was followed by the ‘Recommendations for Access to Data from Publicly Funded Research’ issued by the Organisation for Economic Cooperation and Development (OECD 2006) and have since been implemented as policies in the OECD member states.
The policy papers on the importance of making research data available appeal to the common good but make few suggestions how the desired cultural change is going to be achieved and have not lead to a large scale change in data sharing practices (Kratz and Strasser 2015). Statements by researchers on their willingness to share data come with the best intentions but rarely go beyond lip service and bear little resemblance with the actual release of data (Fecher, Friesike, and Hebing 2015).
Discussions about sharing data in science often revolve around ‘finding the right incentives’ (e.g. Nelson 2009; Borgman 2012). Given the important role of publication and citation in scholarship it is widely assumed that formal data publication would be an incentive for researchers to make data available (Costello 2009; Fecher, Friesike, and Hebing 2015; Kratz and Strasser 2015; Van den Eynden et al. 2016).
Studies by Piwowar and Vision (2013), Sears (2011), and others show that publications that have publicly accessible data accompanying them are cited more frequently and over a longer period of time than publications without access to the underlying data. This citation advantage, however, takes many years to show effect (Sears 2011) and is less than expected (Piwowar and Vision 2013). Inconsistent citation practices for data may contribute to underestimating the impact of data publications (Belter 2014).
Data publication follows the forms developed for the publication of research papers and similar and new metrics are being developed to gauge its impact. But is it the data or the intellectual work that we are interested in as peers? Does the recognition gained by data publication merit the additional effort?
Data publication may not be the only form of sharing data (Parsons and Fox 2013) but it might serve as a proxy for the degree to which Open Access policies have changed researchers’ behaviours.
It is difficult to find comprehensive numbers for the total of all data publications. In this discussion I want to focus on the fields of science, technology and medicine (STM) because research data in these fields are primarily produced by the researchers themselves. And because in these fields formal ways for data publication have existed for a couple of years now, the practices and the evolution of data sharing through publication are best documented. Formal data publication is also a useful proxy for the availability of data because studies have shown that data not published through formal channels become unavailable very quickly (Vines et al. 2014). This paper therefore focuses on data that stand a good chance of being discoverable by users and still being available.
In the years 2005 to 2016 roughly 30 million STM papers were published (Ware and Mabe 2015). A certain proportion of data are published as supplementary materials on the publishers’ or project websites and other informal pathways, but as noted above, these forms of making data available are ephemeral. The volume of formal data publications through DataCite for the time period 2005 to 2016 is approximately 2.6 million data publications (THOR Project 2016). Of these 2.6 million data sets 800,000 are from the marine environmental sciences and have been published through PANGAEA (THOR Project 2016). GenBank has about 260,000 entries for published genomic data (Benson et al. 2013). About 16,400 PURL identifiers are being used in the entire scholarly record indexed by Google Scholar. Of these, less than 5,000 seem to identify digital objects like data, most seem to identify semantic concepts. Life Science Identifiers (LSID) were introduced in 2004 as a way to naming and identifying data resources stored in multiple, distributed data stores for life sciences research. Using Google Scholar as a search engine locates about 14,000 LSIDs that have been used in the scientific literature.
Not all publications come with data, but the majority of STM publications do. Conversely, not all published data are being used in publications. Also, in medicine, psychology and empirical social studies the involvement of human subjects limits the sharing of data. Still, comparing 30 million STM publications to three million data publications shows us that data sharing through research data repositories is still not the norm (e.g. Baronchelli et al. 2006; ‘Share Alike’ 2014).
Are we getting the Incentives Right?
Looking at the comparatively low numbers of data publications we have to ask the question: are we offering the right incentives to researchers to share their data with others more freely? Data citation does not seem to be a draw card. At the same time, neither does generational change seem to lead to a cultural change. Are there more fundamental social drivers that determine what are incentives or disincentives to researchers to share data, or not?
Data publication means sharing data with an anonymous data user. Surveys among researchers show that citation as a form of recognition is very important but it might not sufficient. A large proportion of researchers asked for mechanisms that identify the consumers of their data publications by means of registration, etc., in some cases even asking for co-authorship (Tenopir et al. 2011).
Gift Culture in Science
In ‘Gift Giving as an Organizing Principle in Science’, Hagstrom (1982) presents an account of the ‘gift-giving’ nature of scientific contribution to journals, in which information is traded for recognition, which in turn is thought to motivate scientists. Hagstrom argues that such an exchange system generates commitment to social norms among peers. Ethnographic observation among groups of researchers reported by Wallis, Rolando, and Borgman (2013) are interpreted by the authors to support Hagstrom’s hypothesis that scholarship is characterised by a gift culture in which members of the community make each other precious gifts.
This exchange of goods in a gift culture is not organised as a barter but as an exchange of precious gifts with the expectation of receiving a precious gift in return sometime in the future (Mauss 2011). In the case of the scholarly community precious gifts could be an invitation to speak at a conference, referrals of talented students, access to instruments and other resources, pre-prints of papers in the pre-digital days, and, last but not least, access to data. Putting data on the internet for anonymous users without being able to expect a gift in return is not an incentive in this model of scholarly culture as this violates the principle of reciprocity that is fundamental to the gift culture.
Social Capital in the Scholarly Community
The elements of interaction between research peers can also be described as elements of a researcher’s social capital. Other than Coleman (1988), Bourdieu (1983) does not define social capital as a characteristic of an entire social group, but as means of an individual to influence social transactions and rise in social rank. In the context of the scholarly community, data are a form of social capital. Controlled sharing of data with peers adds power to the network of obligations, expectations and trustworthiness of social structures among peers.
The interpretation of science as a gift culture has been disputed by Latour and Woolgar (1982) who argue that science actually has a currency they identify as credibility. This currency can be transformed into other forms of capital which they identify as money, data, prestige, credentials, problem areas, argument, papers, and so on. Latour and Woolgar distinguish between reward and credibility, focusing on the difference between the process by which reward is bestowed and the process by which credibility is assessed. Both reward and credibility originate from how a researcher’s work is received by his or her peers, which can also be seen as his peers trust in the researcher’s ability to produce valuable research in the future.
The system of credibility and reward in the scholarly community has been described by Fecher et al. (2015) as a reputation economy. In this model of a reputation economy publishing a paper leads to reception by peers. A positive reception will result in an increase in reputation among peers and thus increases the likelihood of being awarded with funding for future research. Research funding gives access to equipment and other resources needed to acquire new data. These data can then be discussed and deliver new arguments for the researcher’s engagement in the scientific discourse and the results can then be published. The publication is then received by the researcher’s peers, a positive recognition adding to the reputation to the researcher. And so the cycle of the reputation economy continues. Success is measured by the efficiency of conversion of one form of capital into another. The elements of the reputation economy, and how these relate to each other, are summarised in Figure 1.
Reputation vs. Collaboration
Every researcher has a personal motivation that brought him or her to choose a career in research. To follow their pursuits in research, individuals need access to resources - and be it their own salary. Access to resources is strongly influenced by the individual’s reputation among his or her peers. To rise to the highest ranks in the academic system a researcher does not only need to deliver good and solid work but ‘exceptional’ work bestow on him or her the reputation of being a ‘distinguished’ researcher. This quest to distinguish oneself from one’s peers makes research a highly competitive pursuit (Hagstrom 1971; Haeussler 2011).
At the same time research is becoming a more and more collaborative exercise (Kowalczyk and Shankar 2011). There are fields where the required resources go beyond the scale of individual grants. Projects like the Large Hadron Collider at CERN or ocean going research cruises extend well beyond the means of an individual researcher, sometimes even beyond the means of entire nations. There is no reputation to be gained in these fields without collaboration. In this system the researcher has to strike a balance between reputation gain and collaboration gain. It is in these fields that require a large degree of collaboration that we see the most advanced technical and cultural mechanisms for sharing research data. Several examples are given in a report by the CODATA-ICSTI Task Group on Data Citation Standards and Practices (2013).
Considering these two opposing drivers, reputation and collaboration, a researcher’s behaviour is strongly influenced by the trade-off between reputation gain and collaboration gain. The fundamental difference between disciplines is the trade-of between reputation and collaboration at points of the reputation economy where changes in the form of capital occur. Sharing data as a form of collaboration must be balanced by a similar gain in reputation. At the same time, collaborative disciplines enforce data sharing as a social norm where non-compliance will result in some form of penalty like exclusion from access to collaborative resources.
Acknowledging the Social Dynamics
The relatively low numbers of data publications show that appeals to the common good have little effect. At the same time there are almost no means to apply a ‘carrot and stick’ approach to enforce new norms with respect to sharing data. In the academic environment researchers enjoy many freedoms and few sanctions. ‘Carrot and stick’ might be the wrong metaphor as it assumes that the metaphorical horse is tied to the cart and carrot and stick can be used to make it move into the desired direction. However, the reality of the situation in academia might be more like the zebra out on the savanna. Here, waving carrot or stick will not have much of an effect.
The strongest norms for researchers are imposed by their peers rather than by their home institutions or by funding bodies. It has long been known that there is little point in setting behavioural norms if there are no means or no willingness to enforce these norms (Spittler 1967). To bring about the changes in social norms around data will need to take into account the social norms of the reputation economy, and the balance between reputation gain and collaboration gain. Understanding these dynamics will allow us to identify suitable points of intervention that will influence behaviours.
To stay with the animal metaphor, the most effective strategy to engage with the animals on the savanna is to find a suitable watering hole. Watering holes are critical points in the savanna ecosystem that all animals have to pass. In this sense, we need to identify suitable points of intervention that are critical points in the reputation economy described earlier (see also Figure 1) to achieve broader access to data. These interactions must be designed in such a way that collaboration enables or adds to reputation gain.
A follow-up of data management plans by funding organisations would be such a point of intervention. Funding rules could mandate that, after some embargoed period of exclusive use, data must be made accessible through accredited repositories and non-compliance may lead to some form of penalty. However, some research funding organisations see policing of data policies as beyond their means and such policies have until now proven to the relatively ineffective.
Another point of intervention would be collaboratively used resources like access to expensive and rare instruments. Renewed access to this resource could be coupled to making data and their interpretation accessible to others, if necessary after an embargo period. As an example, the European Synchrotron Radiation Facility states in their data policy: “Acceptance of this policy is a condition for the award of beamtime” (European Synchrotron Radiation Facility 2015).
To make data available is additional effort and must be a worthwhile investment and therefore Fecher, Friesike, and Hebing (2015) argue that publishing data must add to reputation. And, of course, a powerful vector for gaining reputation is publication and reception by peers. Coupling high-quality publications with data access will offer a strong incentive to not only make some data available, but to make high quality data available (Wallis, Rolando, and Borgman 2013). This would also require that publishers and journal editors follow up on their own rules on data availability (Alsheikh-Ali et al. 2011; Stodden, Guo, and Ma 2013).
Conclusions and Outlook
Sharing of research data has not reached levels comparable to the annual output of research papers published in science, technology and medicine. Recognition of data sharing through data citation does not seem to offer strong enough incentives for researchers to change their practices. The focus on the publication paradigm ignores other forms of sharing (Parsons and Fox 2013) and therefore overlooks their associated motivations (Tenopir et al. 2015).
To be able to move towards broader sharing of data it is important to understand what drives researchers’ behaviours with respect to sharing their data with their peers in order to devise effective research data policies. These policies need to recognise that to researchers data are a form of social capital they will strategically invest in the reputation economy that characterises the scholarly community. A request for collaboration will have to be offset by a comparable reputation gain to merit this investment. This is why we see more sharing of data in disciplines that are more collaborative in nature.
Applying the model of a reputation economy in research allows us to identify suitable points of intervention where the request for data sharing is balanced by a gain in reputation. These points could be enforced data policies around funding, access to research infrastructures, and the publication process.
Sharing data is not a purely technical issue that can be solved by building research data infrastructures. Over the last decade the development of policies around access to research data has evolved with surprisingly little input from social science studies on this subject. More research is needed into which social drivers influence researchers’ behaviours with respect to sharing data. The studies of Haeussler (2011), Wallis, Rolando, and Borgman (2013) and Fecher, Friesike, and Hebing (2015) already give us some important insights.