RESEARCH PAPER A Survey of Researchers’ Needs and Priorities for Data Sharing

One of the ways in which the publisher PLOS supports open science is via a stringent data availability policy established in 2014. Despite this policy, and more data sharing policies being introduced by other organizations, best practices for data sharing are adopted by a minority of researchers in their publications. Problems with effective research data sharing persist and these problems have been quantified by previous research as a lack of time, resources, incentives, and/or skills to share data. In this study we built on this research by investigating the importance of tasks associated with data sharing, and researchers’ satisfaction with their ability to complete these tasks. By investigating these factors we aimed to better understand opportunities for new or improved solutions for sharing data. In May-June 2020 we surveyed researchers from Europe and North America to rate tasks associated with data sharing on (i) their importance and (ii) their satisfaction with their ability to complete them. We received 617 completed responses. We calculated mean importance and satisfaction scores to highlight potential opportunities for new solutions to and compare different cohorts. Tasks relating to research impact, funder compliance, and credit had the highest importance scores. 52% of respondents reuse research data but the average satisfaction score for obtaining data for reuse was relatively low. Tasks associated with sharing data were rated somewhat important and respondents were reasonably well satisfied in their ability to accomplish them. Notably, this included tasks associated with best data sharing practice, such as use of data repositories. However, the most common method for sharing data was in fact via supplemental files with articles, which is not considered to be best practice.

In this study we built on this research by investigating the importance of tasks associated with data sharing, and researchers' satisfaction with their ability to complete these tasks. By investigating these factors we aimed to better understand opportunities for new or improved solutions for sharing data.
In May-June 2020 we surveyed researchers from Europe and North America to rate tasks associated with data sharing on (i) their importance and (ii) their satisfaction with their ability to complete them. We received 617 completed responses. We calculated mean importance and satisfaction scores to highlight potential opportunities for new solutions to and compare different cohorts.
Tasks relating to research impact, funder compliance, and credit had the highest importance scores. 52% of respondents reuse research data but the average satisfaction score for obtaining data for reuse was relatively low. Tasks associated with sharing data were rated somewhat important and respondents were reasonably well satisfied in their ability to accomplish them. Notably, this included tasks associated with best data sharing practice, such as use of data repositories. However, the most common method for sharing data was in fact via supplemental files with articles, which is not considered to be best practice.
We presume that researchers are unlikely to seek new solutions to a problem or task that they are satisfied in their ability to accomplish, even if many do not attempt this task. This implies there are few opportunities for new solutions or tools to meet these researcher needs. Publishers can likely meet these needs for data sharing by working to seamlessly integrate existing solutions that reduce the effort or behaviour change involved in some tasks, and focusing on advocacy and education around the benefits of sharing data.

*Author affiliations can be found in the back matter of this article
A Survey of Researchers' Needs and Priorities for Data Sharing INTRODUCTION PLOS introduced a strong data availability policy in 2014 requiring all authors make the research data that support their results publicly available without restriction, with rare exceptions. The availability of research data supporting scholarly publications is increasing, slowly (Colavizza et al. 2020), and since 2015 many journals and publishers have also introduced journal data sharing policies . Policies contribute to increased availability of research data but new solutions may be needed to further accelerate best practice in data sharing -in compliance with the Findable, Accessible, Interoperable and Reusable (FAIR) principles (Wilkinson et al. 2016).
Aspects of researchers' experiences and attitudes about sharing research data are, relative to other aspects of open science, well-studied. A meta-synthesis of 45 qualitative studies of researchers' practices and perceptions about data sharing found that researchers lack time, resources, and skills to effectively share their data in public repositories (Perrier et al. 2020). In previous research, a lack of suitable infrastructure for data sharing is commonly cited as a barrier to the availability of research data along with a lack of incentives (Science et al. 2019).
Considering the results of previous studies (Allagnat et al. 2019;Borghi & Van Gulick 2018;Eynden et al. 2016;Federer et al. 2018;Houtkoop et al. 2018;Kratz et al. 2015;Open Data 2017;Rathi et al. 2012;Schmidt et al. 2016;Science et al. 2017;Stuart et al. 2018;Tenopir et al. 2011;, beyond infrastructure, researchers' concerns about misuse and scooping (lost publication opportunities) are amongst the most common concerns about, and barriers to, data sharing. In previous research, these concerns are followed in their frequency by, more practical, concerns about copyright and licensing (ownership) and the time and effort required to make research data openly available. The mounting evidence of the common concerns about data sharing, considered in isolation, might suggest a substantial need for new solutions to address these concerns, but the importance of these problems to researchers and researchers' ability to easily solve them is less clear.
In principle, there are numerous solutions available for the problems represented in these findings -from repositories, institutional research support, training programmes, to journal policies and procedures. Yet in practice -to give one example of FAIR data practice -only about a fifth of researchers use repositories to share data when published articles are analysed (Colavizza et al. 2020). Most authors, including PLOS authors, who share research data publicly choose to share data as supporting information files with their published articles (Stuart et al. 2018).
Conversations with PLOS authors have suggested researchers favour the convenience of sharing data as supplemental files (supporting information) with their papers, and, consistent with some surveys, view journals and publishers as a trustworthy steward of their research data (Science et al. 2019). While common problems with data sharing have been repeatedly identified, there is little evidence on how important each of these problems are, and if and how well existing tools, products, and services help to solve these problems.
Adapting an approach to user research and surveying rooted in Jobs To Be Done theory (Christensen et al. 2016), we sought to understand how important different tasks ("factors") associated with data sharing are to researchers, and how satisfied researchers are with their ability to complete that task ("factor"). Part of our motivation for this research was exploring opportunities for new products, partnerships or services that support better data sharing practices by researchers, in particular PLOS authors. We also believed this approach would illuminate the likelihood that researchers will adopt new solutions, providing insight not available from previous studies.
We hypothesised that researchers had unmet needs, which would be represented by factors rated as important but unsatisfied, in tasks relating to: -Preparing, managing, publishing and understanding reuse of their research data There may however be opportunities -unmet researcher needs -in relation to better supporting data reuse, which could be met in part by strengthening data sharing policies of journals and publishers, and improving the discoverability of data associated with published articles.
3 Hrynaszkiewicz et al. Data Science Journal DOI: 10.5334/dsj-2021-031 -Compliance with the data sharing policies of funding agencies, institutions and journals -Their ability to obtain and access other researchers' data for reuse

RECRUITMENT
Our recruitment plan utilized a wide range of channels to reach researchers. This strategy leveraged (a) direct email campaigns, (b) promoted Facebook and Twitter posts, (c) a post on the PLOS Blog, and (d) emails to industry contacts who distributed the survey on our behalf. URL variables were assigned to track the efficacy of each recruitment channel.
Participation was incentivized with 3 random prize draws, each with a $200 prize. The prize draw was managed via a separate survey to maintain anonymity, and 559 of 728 eligible participants entered the prize draw.
The survey received 617 completed responses although 1477 people responded to some of the survey.
The effectiveness of these recruiting methods varied widely. Of the participants who completed the survey, nearly 80% were recruited via direct email campaigns, with the overwhelming majority of these coming in response to a dedicated message about the survey.
Given the importance of our direct email campaigns, it is unsurprising that our cohort was composed largely of former PLOS authors, accounting for 82% of the users who completed the survey.

SURVEY INSTRUMENT
The individual factors associated with data sharing tasks were recast into outcome statements, which represent a researcher's hypothetical success measures for completing a task. Statements were identified by considering the policies, procedures and tasks associated with various aspects of research data management and publishing, the solutions and support currently available to researchers to complete these tasks, and were further informed by conversations with researchers to develop the survey. The outcome statements were constructed with a standard syntax, to ensure that they could be usefully compared to each other.
Typically, a desired outcome statement is composed of a direction of change, a metric of change, an object of change, and an optional context. For example, "Spend less time creating a data availability statement" Where, "spend less" is the direction, "time" is the metric, and "data availability statement" is the object. The context in this case is provided in the associated survey question phrase, "when submitting your research for peer review".
In some cases we have forgone the direction when the context is sufficient to define the goal the researcher is trying to achieve such as "My research data has its own Digital Object Identifier (DOI)" (context from associated question: "when preserving or archiving your data"). While the resulting statements diverge from standard practice, it should not impact how well they can be tested in survey work, or their usefulness in identifying researcher needs.
Using these statements, we constructed a survey in SurveyGizmo, now known as Alchemer, which measured how important a researcher thought the task was, and their level of satisfaction with being able to complete it. The survey was tested by individuals not involved with the study, and who have scientific backgrounds, before deployment to ensure it is understandable.
Importance was measured using a five-point unipolar scale, which was later mapped to a value from 0 to 100: Since satisfaction can be expressed in negative terms, it needs a different scale that can account for this. Therefore, satisfaction was measured using a seven-point bipolar scale, also mapped to a value from 0 to 100:

DATA ANALYSIS PROCESS
We are most likely to see a need for a new solution when we identify user needs that are both important and underserved, measured by importance and satisfaction scores. We can see the relationship of importance and satisfaction by mapping the mean importance and satisfaction scores for each factor on a scatter plot, with importance on the y-axis, and satisfaction on the x-axis (Figure 1). On each axis, a neutral response is mapped to 50. Viewed in this way, the factors that map to the upper-left quadrant indicate opportunities for new solutions, as they are generally regarded by researchers as both important and underserved.

Ethical considerations
We did not obtain approval from a research ethics committee as the research was considered to be low risk and we did not collect sensitive information about the participants. All data were collected anonymously. Participants were informed that their participation in this survey was completely voluntary, and that they were free to withdraw from the study at any time until they submitted their response. Answers will never be associated with individual participants and the results will only be analyzed in aggregate. The data collection procedures and survey tool were compliant with the General Data Protection Regulation 2016/679.

SURVEY PARTICIPANTS
The survey received 617 completed responses. The distribution of respondents by discipline and career stage is very similar when comparing the completed and whole cohort (completed and partial responses). Respondents self-identified their career stage in question 5 of the survey. Responses from the 617 individuals who have completed the survey -those who answered all questions in the survey-are used in our analysis (Table 1).
Over half of the respondents were from Biology and Life Sciences or Medicine and Health Sciences disciplines. The cohorts from Physical Sciences, Engineering and Technology, Earth Sciences are small (with a maximum of 18 completed responses) (Figure 2). The largest proportion of survey respondents self-identified as Early Career researchers (45%), followed by Mid-Career researchers (36%) and Late-Career (18%). The majority of survey respondents were from North America (79%).  Table 1 Survey respondent demographics.

Figure 2
The most common discipline of respondents who completed the survey was Biology and Life Sciences, followed by Medicine and Health Science.

Data sharing approaches
Respondents were asked to select all the methods of data sharing that they had previously used. Sharing data as supplemental files alongside a research paper was the most common method for all career levels (67%), followed by deposition in a public repository (59%) and sharing privately on request (49%). Only 10% of respondents reported that they had never shared their research data -the largest proportion of whom (42%) work in Medicine and Health Science disciplines. Sharing data privately, upon request was more common for more experienced researchers (Figure 3).

Prevalence of data reuse
Respondents were also asked if they have ever reused someone else's data. 52% responded 'yes' and 48% 'no'. These proportions are very similar when segmenting for career stage cohorts, with the yes/no split being 51%/49% for early-career, 52%/48% for mid-career and 53%/47% for late-career.

IMPORTANCE AND SATISFACTION SCORES OF DATA SHARING TASKS
Respondents were asked to rate the importance and their satisfaction for 36 factors related to sharing or reusing data. These answers have been turned into importance and satisfaction scores (see Methods section for details). A score of 0 indicates that researchers do not find the factor at all important or they are completely dissatisfied with their ability to carry out the task. A score of 100 indicates that they regard the factor as of the highest importance or they are completely satisfied with their ability to undertake the task. The mean importance scores ranged from 37.8 to 85.0 and the mean satisfaction scores ranged from 41.4 to 69.1. Tasks related to data sharing and reuse have been grouped according to the section of the research lifecycle that they primarily fall in ( Table 2). The following groupings were used in our analysis: data preparation, policy requirements, data publishing, and data reuse.

Data preparation
This stage includes time preparing data for sharing, such as organising files, deciding which datasets to share, describing the data and preparing usage rights statements. Overall, these factors scored in the lower to middle range of importance and mid to high satisfaction when considering all of the factors presented in the survey.

Policy requirements
The factors concerning policy requirements include policies from funders, institutions and journals related to data sharing and data management. In terms of importance, four of the factors rank very highly (between 62.6 and 73.8), three of which are policy compliance factors Figure 3 The most common method for sharing research data in the past is as supplemental files.  and the other meeting funder requirements for data management plans. The other two factors in this group fall within the mid-range of scores for all factors surveyed. These factors scored towards the higher end of all the factors for mean satisfaction, with the three factors about compliance scoring the highest mean satisfaction scores when all factors are considered.

Data publishing
Factors around ensuring data are discoverable, citable and licensed correctly scored higher in importance than factors more related to the process of publishing data, e.g. spending less time uploading files. None of the factors scored towards the extremes of the range of importance scores for all factors. Respondents were generally satisfied with all factors when considered alongside all the factors surveyed.

Data reuse
Reuse of my data Factors within this group were spread across a range of importance scores. Factors such as, "increase the likelihood that my research benefits science" and "increase the likelihood that my research papers are cited" are amongst the highest rated of all factors for importance (85.0 and 70.0 respectively). Conversely, "ability to control who can use my data" is one of the lowest scoring factors in terms of importance (39.1). The factors in the group have similar mean satisfaction scores compared to other groups, ranging from 46.5 to 55.3, with only two factors scoring less than 50. The lowest satisfaction score belongs to "understand who is using my data set" (mean = 46.5, 95% CI = 2.1). This factor also scores 54.3 for mean importance, making it important yet underserved. Reuse of other researchers' data Factors in this group scored in the lower and middle of all the importance scores, ranging from 45.3 to 61.4. The most important factors in this group are the two associated with spending less time finding and getting hold of others' data. All of these factors had low satisfaction scores compared to the rest of the survey, ranging from 41.4 to 45.2. For two of the factors -"Spend less time searching for articles with reusable datasets" and "Spend less time making individual requests for datasets" these were regarded as both important and not satisfied meaning that these are both significantly underserved factors, as the 95% confidence intervals for the importance scores are above 50 and the satisfaction scores are below 50 ("Spend less time searching for articles with reusable datasets" mean importance = 61.4, 95% CI = 3.7 and mean satisfaction = 41.3, 95% CI = 2.6; "Spend less time making individual requests for datasets" mean importance = 58.9, 95% CI = 3.7 and mean satisfaction = 43.0, 95% CI = 2.6).

CAREER STAGE AND DISCIPLINARY DIFFERENCES Career stage
Early career researchers gave the highest importance scores to 22 out of the 36 factors surveyed when compared to mid-and late-career respondents. This difference was only statistically significant for 10 of these factors when a two-tailed t-test was used to compare the cohorts (p < .05). Late-career researchers gave the highest average scores for 6 factors and mid-career researchers gave the highest score for 7, but none of these differences were statistically significant when compared to the next-highest score. The mean score for one factor was the same for both early and mid career researchers who scored it higher than late career researchers. Differences in mean importance scores between the 3 career-based cohorts ranged from 1.7 to 25.5 between highest and lowest score for each factor.
Satisfaction scores were less variable by career stage. Although late-career researchers were on average more satisfied with their ability to complete the tasks, the scores between career stages were more similar in comparison to the importance scores, with the maximum difference being 11.1. Fewer statistically significant differences between the cohorts were seen with satisfaction scores using t-tests but again, no clear trends emerged.
One notable difference between the early and mid-career cohorts versus the late career cohort was that the factors "Spend less time making individual requests for datasets" and "Spend less time searching for articles with reusable datasets" both fell into the important and underserved segment for early and mid-career researchers when taking the 95% confidence intervals into account but did not for late career researchers (Table 3).

Discipline
One notable difference between the disciplines surveyed was seen in the answers provided by those who identified as researchers in physical sciences. This group scored more factors as both important and not satisfied than the other disciplinary groups. Researchers from the 'Social Science' group scored fewer factors as not satisfied than the other disciplinary cohorts. There was consensus across the disciplines that "Increase the likelihood that my research benefits science" was the most important factor. In all disciplines, factors relating to the reuse of other researchers' data received low satisfaction scores.

DISCUSSION AND CONCLUSION LIMITATIONS
The survey focused on researchers from North America and Europe, a high proportion of whom have published with PLOS. This scope limitation was to help ensure a sufficiently large sample of researchers in certain regions to draw meaningful conclusions. Further, the survey was written only in English, and we assumed that this impacts response rates in some regions.
Some of the disciplinary samples (Earth Sciences, Engineering, and Physical Sciences) were too small to be considered representative of the corresponding research community. The high proportion of PLOS authors could impact our results, as their experience with data sharing requirements -PLOS's strong data availability policy relative to most other publishers -may differ from the non-PLOS cohort.  The factors that the survey asked about do not cover all aspects of data sharing, as they were derived in part from our assumptions about which tasks researchers might find problematic and where there might be opportunities for new solutions. They were also intended to be disciplineagnostic. For example, issues specific to certain types of data, such as sensitive data, are not included. There may be important and underserved needs around data sharing that were not tested in our survey. For example, we did not include factors relating to technical problems or the quality of the data that is being shared.

IMPACT OF CAREER STAGE
There is a general tendency from Early-to Mid-to Late-Career researchers, and researchers who have published more articles, of declining importance scores, although the differences are mostly not statistically significant (p < 0.5). More experienced researchers and authors rate the importance of the majority of factors lower on average, although there are fewer differences with levels of satisfaction with existing tools, as satisfaction numbers remain more stable across these segments. This suggests that ECRs regard these factors as more important, as opposed to having not yet mastered the tools needed to effectively share and reuse data. However, the factors with the greatest mean importance scores when comparing early and late career researchers can be considered those that are more likely to be relevant to junior researchers, for example "Increase my co-authorship opportunities" and "Spend less time making individual requests for datasets" both of which have p value < 0.001.

COMPARISON WITH PREVIOUS RESEARCH
Multiple surveys have quantified how common researchers' problems or concerns are with data sharing (Allagnat et al. 2019;Borghi et al. 2018;Lucraft et al. 2019;Science et al. 2017;Tenopir et al. 2011;Wiley Open Science Researcher Survey 2016). Our findings suggest that while many factors (problems) associated with sharing research data are important to researchers, on average, researchers are reasonably satisfied with their ability to share data, from their perspective. Overall our findings are additive to previous research, providing additional context as to why solutions such as data repositories are still used by a minority of researchers, despite data repositories, ostensibly, being available for most types of research data. If researchers are generally satisfied with their ability to complete a task associated with data sharing, this suggests that researchers will be unlikely to be motivated to seek (new) solutions to that problem, no matter how common it is.
For example, our finding that the 'ability to control who can use my dataset' was slightly important (39.1 importance) extends previous findings that researchers' concerns relating to misuse of their data is very common (Science et al. 2019). If this concern is common yet not very important to the average researcher, then it may be viewed as "low stakes", and not a motivator for action. The ability to 'trust the researchers who request my data' may also relate to potential for misuse of researchers' data but was rated as moderately important, not viewed as "low stakes", (58.3) as was 'choose an appropriate license for my data ' (54.4). Both of these factors, associated with reuse of researchers' own data, were somewhat satisfied, however.
The highest average score for importance was found for 'Increase the likelihood that my research benefits science' (85.0 importance; 54.9 satisfaction). This supports previous research, where increasing the benefit to science or society of research is commonly amongst the top reasons or motivations for sharing research data (Science et al. 2019). The second most important factor overall related to compliance with funder policies on data sharing (73.8 importance; 69.1 satisfaction), ranking it more highly than in previous research exploring factors that motivate data sharing (Science et al. 2019). The third most important factor related to increasing the likelihood that researchers' papers are cited (70.0 importance; 52.5 satisfaction). This reputational factor, citations, is consistent with increased impact of research and desire for greater credit (recognition) for data sharing found by previous research. It may also offer opportunities for promotion of the potential benefits of sharing research data by tool and service providers. Sharing research data is associated with increased citations to researchers' papers (Colavizza et al. 2020;Piwowar et al. 2013).

POTENTIAL OPPORTUNITIES RELATING TO DATA REUSE
While most of the factors we assessed appear to be reasonably well satisfied from the researchers' perspective, a small number of factors suggest potential opportunities for new or better solutions. Around half of survey respondents indicated that they have reused research data in the past, consistent with findings from other surveys (Science et al. 2018;. These factors feature in the upper left quadrant (Figure 4) -albeit moderately so -and relate to reuse of other researchers' data: Both these factors are relevant to scholarly publishers, who can influence the accessibility and availability of research data associated with publications -with research data policies (Vines et al. 2013) and associated workflows. Making research data available in repositories that enable compliance with the FAIR Data principles, and creating prominent and visible links to those data in journal articles, might be a simple solution to the first factor.
13 Hrynaszkiewicz et al. Data Science Journal DOI: 10.5334/dsj-2021-031 Researchers' dissatisfaction with obtaining research data from individual requests to other researchers has policy implications for journals and publishers who wish to further support open science and open research. While many journals now have policies on sharing research data, and many peer-reviewed papers include statements about the availability of data supporting publications, many of those statements state that data are "available on [reasonable] request". Multiple studies (Rowhani-Farid et al. 2016;Savage & Vickers 2009;Vanpaemel et al. 2015;Wicherts et al. 2006) have found that researchers have been unable to obtain data supporting publications when those data are 'available on request', consistent with the dissatisfaction amongst our survey respondents. Since PLOS introduced its data availability policy in 2014, "data available on request" has not been permitted when publishing in PLOS journals. Publishers can potentially meet this data reuse need by strengthening their policies on data sharing -requiring all data supporting publications to be publicly available unless legal or ethical restrictions apply, and working to eliminate "data available on request" as an acceptable policy. And, in such cases where data must be available under restricted access, requiring information on conditions and procedures for data access and reuse.

OPPORTUNITIES TO BETTER SUPPORT DATA SHARING
The relative unimportance of some factors associated with best data publishing practice, such as deposition of data in repositories, suggests the need for more advocacy to researchers and education of the benefits, or for data repositories to be more integrated with the traditional publishing experience in such a way that researchers do not need to change their behaviour in order to use them. Amongst PLOS authors and the survey respondents, the most common method for sharing data is via supplemental (supporting information) files with their publications. More than half our survey respondents indicated they had shared data in a repository in the past but when published articles are analysed, data repositories are used by around a quarter of authors publishing with PLOS. At PLOS this proportion has been slowly growing each year, from 18% of authors in 2015 (Colavizza et al. 2020).

Figure 4
Respondents were on average satisfied with their ability to complete the majority of tasks associated with Data Preparation, Data Publishing and Reuse of their own data but dissatisfied with their ability to complete tasks associated with Reuse of other researchers' data. Tasks associated with meeting policy requirements are important and satisfied. 95% Confidence intervals for the mean values ranged from 1.7 to 4.1 for importance scores and 1.7 to 2.9 for satisfaction scores.