FAIRness Literacy: The Achilles’ Heel of Applying FAIR Principles

Romain David; Laurence Mabile; Alison Specht; Sarah Stryeck; Mogens Thomsen; Mohamed Yahia; Clement Jonquet; Laurent Dollé; Daniel Jacob; Daniele Bailo; Elena Bravo; Sophie Gachet; Hannah Gunderman; Jean-Eudes Hollebecq; Vassilios Ioannidis; Yvan Le Bras; Emilie Lerigoleur; Anne Cambon-Thomsen; The Research Data Alliance – SHAring Reward and Credit (SHARC) Interest Group

Introduction

This paper reports on a work developed as part of the Research Data Alliance – SHAring Rewards and Credit Interest Group (RDA – SHARC IG), established in 2017 and gathering people from various disciplines (biomedicine, biodiversity, agriculture, geosciences, science information, semantic science) to move forward scientific crediting and supporting mechanisms for scientists who strive to organise their data (and material resources) for community sharing. The group’s work led progressively to the notion that increasing the FAIRness literacy in the research field, as well as enabling the assessment of FAIRness by all scientific communities are prerequisite critical steps to make data & resources sharing truly happen. This paper aims at sharing this experience.

Making resources available for the community means ensuring that data (and related materials) are findable and accessible on the Web (), and that they comply with adopted international standards making them interoperable and reusable by others (, ). This aligns with the FAIR data principles that have been proposed as part of the international initiative FORCE11, to make data Findable, Accessible, Interoperable and Reusable (FAIR data principles from ). Data should be made as FAIR as possible, even if it is not possible to make data open (). However, the FAIR principles do not address the quality issues whereas they are a major concern for improving data sharing; they focus on mechanisms to facilitate and optimize data sharing and this does not preclude the responsibility in assessing the quality and appropriateness of the data reused.

In practice, making data FAIR requires considerable time, energy and expertise () and who should contribute to the process is still not clear. As a result, except in specific communities routinely dealing with big data (nuclear physics, astronomy, genomics, satellite imageries…), the FAIR principles are not considered a priority and not implemented rigorously or very heterogeneously across disciplines & countries.

This factual situation is persisting even though lots of efforts have been done over the last years to transform the FAIR principles into practice. Some technical solutions are now well described () and many groups (including within RDA) are working to develop methods and tools. Multiple implementation networks have been created under the GO FAIR initiative to facilitate cross – discipline exchanges on FAIRification. Infrastructures for preserving, locating and reusing research data already exist () such as the ones built in Europe in the context of the ESFRI roadmap and the implementation of the European Open Science Cloud (EOSC). In addition, worldwide validated certifications are now possible such as the ones delivered by the Core Trust Seal which demands 16 trustworthy requirements for approved Core Trustworthy Data Repositories.

On the funders’ side, handling of research data according to the FAIR principles as part of Data Management Plans (DMPs) is now strongly encouraged by most funders and even required in certain funding schemes. This holds true for many applications to European Commission calls including H2020 and for some national funding agencies (e.g., French ANR, US NIH, UK Wellcome Trust, Austrian FWF). However, projects that opt out are still encouraged to submit a FAIR compliant DMP on a voluntary basis (). The RDA has several groups working on specifying DMPs, establishing recommendations for their format and building mechanisms to facilitate their creation and use.

Although essential, the methods and tools developed support machine actionability – capability of computational systems to use services on data without human intervention – in their focus () making difficult their usability and understanding beyond the data scientist communities which limits a lot their use. Implementation of services and tools required to get FAIR compliant data can only be possible if scientists who are leaders of projects and non-experts in data science are able to understand the whole of the FAIRification processes, the stakes and the steps required. This lies on the need to reach community-approved choices of necessary tools, services and standards. The willingness of non data experts to comply to FAIR principles is a prerequisite in such an approach. Thus, to deal with the increasing number of tools, standards & services released, sufficient and appropriate guidance and support are needed. To that aim, appropriate criteria must be clearly defined to enable FAIRness assessment and methods, and processes must be developed to enable FAIRification. With the new age of Open Science, the European Commission Working Group on Rewards under Open Science delivered a matrix to help implement and assess FAIR criteria during the elaboration of DMPs (). Based on this matrix, Reymonet et al. () made a detailed plan of quality criteria required during the three phases of development of DMPs; meanwhile Wilkinson et al. () displayed the FAIR criteria according to their ability to be validated automatically.

We believe that these criteria are not easily put in practice by researchers and are difficult to interpret by evaluators as they rely mainly on machine-actionability properties that may not be accessible to every scientist. In this respect, many recent collaborative works have started to propose ways to implement, adapt and evaluate FAIR principles in several communities (e.g., , , , , , , , , ). However, we are also reaching a moment where the FAIR principles now need cross-community convergence and consensus (; ; ; ; ; ). The work on FAIR data standards, repositories and policies is already ongoing as very well illustrated by the FAIRsharing.org platform which gathered more than 2800 registered standards, databases and policies (, ), by various RDA – WG (e.g., FAIR data maturity model) or by the international GO FAIR initiative (). There is a need for an implementation of the FAIR principles in a progressive and community-oriented way, consolidated within existing practices to ensure that they evolve without interruption and in a way that is acceptable to the various actors. With this approach, everyone’s ability to assess FAIRness is at the heart of the process.

In consideration of the tools available and on the basis of discussions at workshops (Breakout sessions at RDA 13th and 14th plenaries), an online survey and multiple discussions in weekly teleconferences, the SHARC Interest Group developed and discussed a tool for FAIRification that was more attuned to the broader research community to enable FAIRness assessment either by external data evaluators or by the researchers themselves. Doing so, we addressed the building of pre-FAIRification processes which are needed by the community to better understand the requirements and stakes in FAIRification. This paper describes many of the matters that came to light as we worked towards the creation of such a tool. This whole experience is here reported.

Developing and proposing an interdisciplinary language

Because of the interdisciplinary nature of our IG’s work, various interpretations of underlying concepts have arisen, implying a need for clear definitions. A glossary – designed to evolve – has been created using as many community-approved references as possible (available in “Glossary” tab in the SHARC tool – ), as well as other complementary terms.

Converging towards consensual human-readable, understandable & assessable criteria

Encouraging implementation of efficient data sharing methods requires that they can be assessed easily qualitatively and quantitatively by any scientist. In particular, the FAIR data principles must be intelligible for any research scientist, even if he/she is not a data scientist. FAIRness assessment must be realistic and pragmatic – what should be measured, and how to explicitly find the information needed.

Getting such human-understandable FAIRness assessment criteria would help to guide scientists to follow the FAIR data principles and would help evaluators to objectively achieve their task.

On the basis of the initial criteria developed by Wilkinson et al. (; ) and Reymonet et al. () in particular, the SHARC IG’s worked at providing a FAIR assessment template drawn on a new classification of FAIR criteria that is aligned with the questions researchers should address while elaborating DMPs. This template is meant to be adaptable to the needs of various communities and is designed to train and support their data FAIRification improvement. The sets of FAIR criteria and their relations have been summarised as mind-maps to present a quick overview of the FAIR aspects (Figures 1, 2, 3, 4). The assessment tool is available at http://doi.org/10.5281/zenodo.3922069.

Figure 1

Mind mapping FAIRness assessment criteria. The mind map shows the extensive set of FINDABLE criteria of the FAIR principles (12 criteria). The level of importance of each criteria is divided into three categories (illustrated by three colours), Essential (purple)/Recommended (brown)/Desirable (red).

Figure 2

Mind mapping FAIRness assessment criteria. The mind map shows the extensive set of ACCESSIBLE criteria of the FAIR principles (11 criteria). The level of importance of each criteria is divided into three categories (illustrated by three colours), Essential (purple)/Recommended (brown)/Desirable (red).

Figure 3

Mind mapping FAIRness assessment criteria. The mind map shows the extensive set of INTEROPERABLE criteria of the FAIR principles (5 criteria). The level of importance of each criteria is divided into three categories (illustrated by three colours), Essential (purple)/Recommended (brown)/Desirable (red).

Figure 4

Mind mapping FAIRness assessment criteria. The mind map shows the extensive set of REUSABLE criteria of the FAIR principles (17 criteria). The level of importance of each criteria is divided into three categories (illustrated by three colours), Essential (purple)/Recommended (brown)/Desirable (red).

Lessons learned: need for a gradual implementation of FAIR criteria

While collectively designing and discussing the FAIR assessment tool, a major concern emerged: the need for a gradual implementation of the FAIR criteria, where at each stage the levels of achievement are made explicit. The collective feedback was analysed and formalised as a step-by-step iterative process for a FAIR data sharing, as described in parts A to C and further discussed in parts D to F.

A– Fostering pre-FAIRification decisions: when shared interoperability needs become community approved vocabularies goals

A better perception of the gain and return on investment from the use of controlled vocabularies (especially in the form of ontologies) is critical as it triggers the need to prepare FAIRification and ensure interoperability.

Each of the four categories of FAIR criteria refers to the use of community-approved formats, standards and controlled vocabularies (). Common naming conventions for classifying standards for reporting and sharing data or other objects appear favourable, assuming that those standards are registered and therefore reusable and mappable. FAIR criteria as defined by Wilkinson refer to FAIR compliant vocabulary used to ensure interoperability. The use of these reference vocabularies or ontologies also improves indexing and findability of data and sometimes makes it possible to cite an author for a selection of data only. This can help give additional impact to original studies by generating more citations. Standard vocabularies and ontologies also allow inclusion in many catalogues formatted with these recommendations. It also offers a greater ability to maintain data on the long term, allowing much greater backward compatibility of new software and new innovative processing chains of old data. Further, the use of vocabularies and ontologies increases the number of links and therefore possibilities of access and reuse of all or parts of a dataset.

However, the use or implementation of controlled vocabularies or ontologies may seem complex and time-consuming, especially since the means to find and explore them are still not fully harmonized (i.e., in the biomedical domain, an example in terms of number of semantic resources developed, four/five ontology repositories or libraries are available). It is crucial that the numerous potential benefits are better explained to each actor of a given scientific community. The responsibilities lie, among others, with repositories and standard providers. They have to outreach to the community via targeted education, publicity at events and reporting success stories.

Nonetheless, the first level of interoperability can be easily reached, with substantial return on investment by using metadata vocabularies recommended by the W3C to make data of different domains and disciplines compatible with each other. Semantic Web standards such as RDF, OWL and SKOS recommendations from the W3C, help metadata and data machine-readability.

B– Dealing with the unequal understanding of FAIR criteria

Even within a particular scientific community, definitions and interpretations of concepts covered by the FAIR evaluation criteria are manifold. The diversity of perception of minimum work to be achieved in the short and long term often leads to either a deep underestimation of the means to be implemented or even worse, to the fact that when significant resources are allocated to FAIRification, the solutions provided do not meet the prerequisites and create additional locks.

Two impediments relate to specific types of communities:

In more mature scientific communities there may be a reluctance to change familiar tools and entrenched habits. FAIRification will probably require a great deal of effort, a change in mind-set and training in new skills.
Emerging science communities are often embryonic; the complexity of the topics covered and the emergence of jargons (particularly in interdisciplinary fields) increases the difficulty of organizing complex processes around data at the interfaces of different scientific cultures.

These two impediments can be removed by detailing concrete actions and associated means for pre-FAIRification. Such actions must explain the meaning and the interest of the respect of each criterion within the framework of the community. In addition, these actions will not only have to be approved by the domain community as bringing a real improvement in the FAIR quality of data, but also meet all criteria for sustainable reuses beyond the community.

Furthermore, convincing all stakeholders that FAIR criteria compliance can foster data quality improvement, not only in terms of data management for reuse but also for data initial uses, will be a good asset.

Moreover, increasing global FAIRness compliance requires absolutely to raise the level of FAIRness literacy for each actor, and particularly that of team managers and project leader, who have the role of arbitrating between recruitment of several human profiles.

C– Planning pre-FAIRification training and support

Essential criteria (i.e., FAIR criteria that would block the FAIRification process if not applied) are not always understandable without specific education or training. Our collective discussions underlined frequently that implementation of some criteria is thought to be time consuming and may need sufficient and appropriate technical support, adapted to the capabilities of each actors (e.g., skills, language, availability).

The adoption of the FAIR data principles requires a cultural change. The first desirable and desired step for researchers is to better understand the efforts required to meet each FAIR criterion in a suitable manner that optimizes costs (time, human resources, skills). This good comprehension of the necessary efforts is a sine qua non condition for the researchers that can help them choose the less costly strategies to FAIRify their data. These efforts can relate to:

Policies: e.g., the choice of the perimeter of the community in which this FAIRification process will be implemented;
Strategic and tactical steps: technological choices, and implementation steps for wide acceptance in the predefined community;
Pedagogical steps: e.g. providing educational kits adapted to different levels of skills up to ‘qualified training’ where ‘trainees’ knowledge is evaluated;
Human resources: enabling researchers, engineers and technicians to acquire, at different time scales, the necessary skills;
Governance: the training of managers to improve the relevance of their arbitrations and improve the planning of the means necessary for FAIRification in the short and long term.

It is also necessary that all the benefits of the FAIRification are clearly explained and demonstrated. Among the benefits of FAIRification is the impact of research, where data is a new entry point. It is also time saving for setting up new data processing chain, manipulating data, cleaning up data, reconstructing missing data, or feeding more rigorously machine learning processes. FAIRification of data or services is also expected to build Virtual Research Environments (VRE) that are more reliable and allow for more reproducible research.

The strategic and tactical choices for implementing FAIRification must allow for a gradual implementation. Each step must be understood by each of the stakeholders, the decision process about criteria prioritisation should be defined (this may depend on the area of application) and possibly the list of criteria that can be neglected if the means are not sufficient. For each criteria, it will be useful to show the return on investment, especially in terms of scope, speed, quality and richness of shared treatments. All these elements must be included in training and supports.

D– Planning a step-by-step pre-FAIRification process

Considering the diversity of actors whose contribution is necessary for explaining, training and supporting FAIRification, it is imperative to rely on planning tools and on structuring the organization of the actors and on prioritizing the steps required with respect to each one of these actors, and according to the means and skills available.

This pre-FAIRification process is a sine qua non step for community acceptance of the efforts required for later FAIRification; especially with an efficient long-term effect and considering interoperability between heterogeneous systems.

Pre-FAIRification involves different stakeholders, such as funders, policy makers, publishers (to make FAIR data a requirement), institutions (to provide infrastructure, training, support and policies at their departments e.g., in the library), and disciplinary communities (to create community standards). This illustrates that researchers have to be supported throughout the entire life cycle of research data starting from their generation until final sharing and archival.

To empower scientists for these efforts, it is necessary to co-construct the FAIRification planning tools by taking into account the resources of each actor (e.g., typically DMPs in a research project are under the responsibility of the project coordinator). It is all the more important as future research work-programmes project (e.g., European Commission’s Horizon Europe are announced to be granted only if the FAIRification processes are properly detailed and correctly sized in the proposal (human and technical resources in particular).

Identification of community milestones to organise FAIRification as an iterative and quality process resulted from our discussions (described below and illustrated as the wheel of FAIRification in Figure 5). FAIRification should include four distinct steps in an iterative process:

Preparing: FAIRification for a specific scientific community requires first that what is meant by FAIRification is carefully explained and that constraints and advantages in the short, medium and long term are described.
Training: to improve FAIRness literacy; it permits to convince stakeholders in the whole community. Note that this literacy should be maintained during the planning for FAIRification, especially when this is done before funding.
Pre-FAIRifying: this stage must be feasible for all actors. It encompasses the largest common denominator of objectives achieved by all and its success is crucial to empower the whole community further in the FAIRification process. Pre-FAIRifying can be divided in the following iterative steps:
- 3a. Defining the community: a set of actors agreeing on the way they delimit their own community and the subjects related to it.
- 3b. Defining objects and variables to be FAIRified
- 3c. Selecting what to identify and index (data, actors, objects, processes, etc.)
- 3d. Analyzing which are the common denominators
- 3e. Reducing the explicit needs and expectations related to this first step in order to ensure the achievement of a common objective for all stakeholders (downward levelling)
FAIRifying by applying the prepared plan, check if results are compliant with the community approved plan, and adjust if necessary.

Figure 5

FAIRification can be schematized as a wheel describing iterative quality steps that need to be approved by the community throughout the process. This schema displays the “preparing” and “training” phases as conditions of pre-FAIRification. The pre-FAIRification processes must be community-approved at each iteration. The FAIRification steps ‘check’ and ‘adjust’ implementation must be approved by the community before a new iteration.

Whenever a common goal is reached, the community can redefine a new objective (and ideally enlarge it). Planning the second iteration with a new statement of expectations (back to step 3b or 3a if the community has evolved, which is very often the case) requires before so “Preparing” and “Training” steps.

E– Maintaining sustainability

The complex concepts behind FAIRification keep evolving quickly: because of constant knowledge and process improvement, information systems for research data are fed by constantly renewed protocols. FAIRifying new and old data requires different iterative processes based on a consultation of all participants each time a new concept is needed. Both situations demand some differentiated human, technical or financial resources and organisations.

New data produced are increasingly heterogeneous and multi-source, sometimes even in exotic formats. Each new set of data, resulting from an exploration of a new research subject and/or the implementation of an experimental process based on emerging technologies is still rarely based on a fixed scheme allowing an immediate match with the FAIR principles. Human and technical support over time are critical for implementing them. For old data, unless it is a research subject on its own, FAIRification can not be funded currently out of responses to calls for research projects re-using these data.

Whether these data are old or recent, FAIRification is only possible if it occupies a central place in the research project, and therefore is understood and adhered to by all stakeholders as a long term issue. As it is clear, even today in the call for projects on FAIRification, that the level of understanding of the FAIR issues and of the needs for FAIRifying in a sustainable way is still very uneven, specific efforts are absolutely required for each type of actors.

Among the obstacles to data sharing identified by the RDA SHARC ig group is the lack of recognition. This is even more true when considering the efforts required for FAIRification which is critical for efficient data sharing.

Until now, research institutions do not take into account FAIR principles implementation when evaluating researchers and themselves are not evaluated on the extent to which they support their researchers to do so. To that aim, it is essential that the FAIRification activity be assessed in research evaluation schemes (policy issues at institutional, national and supranational levels). Specific mechanisms necessary for such evaluation scheme (e.g., the use of identifiers for researchers and their data to enable credit back) have to be included early in the FAIRification process. Those are first aimed to link persistently and unambiguously shared data within the global web of data by associating their creators/contributors and institutions (with relevant metadata) so that shared data paternity is unequivocally attributed. Therefore perennial and unambiguous identification processes are needed for data on one side and persons (creators, contributors) or research institutions on the other side. The idea came out from many discussions that perennisation will be guaranteed only if communities are able to create, choose and administrate consensual data authorities over time.

Various types of identifiers currently exist, for research outputs such as DOI, ARKs, Handles, URIs etc., or for researchers (ORCID, ResearcherID, Scopus IDs, etc.) and for research organisations (ROR IDs, GRID, ISNI). With CrossRef part of it, the current ecosystem makes it possible to interconnect unambiguously scientists with their deposited datasets, publications, other professional contributions or activities (e.g., open peer reviews), research organisations, funders and so forth. Online generated metrics can also help crediting the work and a different kind of reward may follow if this is taken into account in the research evaluation scheme (e.g., project awarding; dedicated financial support; career promotion).

Conclusion

The RDA – SHARC IG has established that the FAIR data principles need to be adequately explained from the very beginning of a research project design and that training needs to be provided as early as possible. To help implement this, the SHARC IG is developing a tool to support the assessment of FAIRness literacy and therefore enable measurement of progress towards compliance.

When developing this tool we identified a step-by-step process which will help teams organise the various actions towards the achievement of FAIR data, (i) by pre-FAIRifying, and (ii) by anticipating heterogeneity in FAIR literacy and sustainability. Even if the FAIR goal is not reached in one step, it can make FAIRification more understandable and improve its acceptability.

We highlight that researchers should be supported by data management professionals (not only data stewards), organised in networks and embedded in institutions. In order to enhance treatment of data according to the FAIR principles, we suggest that organisations should be assessed on the basis of how well they support their researchers in becoming FAIR advocates.

This shared experience provides arguments to motivate researchers and institutions to invest sufficient means (human or financial) for the needs of structured and long term FAIRification processes.

Nevertheless, FAIR compliance will certainly not be enough to enhance real and high data sharing and reuse; other mechanisms and qualities should be considered in future sharing processes (e.g., data veracity, quality, publicity, large indexation, curation, support). Developing a strategy to orchestrate efforts across the variety of communities should be a first priority in order to avoid the dispersed attempts of standards adoption and of compliance to FAIR.

Data Science Journal

Practice Papers

FAIRness Literacy: The Achilles’ Heel of Applying FAIR Principles

Abstract

Introduction

Developing and proposing an interdisciplinary language

Converging towards consensual human-readable, understandable & assessable criteria

Lessons learned: need for a gradual implementation of FAIR criteria

A– Fostering pre-FAIRification decisions: when shared interoperability needs become community approved vocabularies goals

B– Dealing with the unequal understanding of FAIR criteria

C– Planning pre-FAIRification training and support

D– Planning a step-by-step pre-FAIRification process

E– Maintaining sustainability

Conclusion

Notes

Acknowledgements

Competing Interests

Author Contributions

References

Practice Papers

FAIRness Literacy: The Achilles’ Heel of Applying FAIR Principles

Abstract

Introduction

Developing and proposing an interdisciplinary language

Converging towards consensual human-readable, understandable & assessable criteria

Lessons learned: need for a gradual implementation of FAIR criteria

A– Fostering pre-FAIRification decisions: when shared interoperability needs become community approved vocabularies goals

B– Dealing with the unequal understanding of FAIR criteria

C– Planning pre-FAIRification training and support

D– Planning a step-by-step pre-FAIRification process

E– Maintaining sustainability

F– Increasing good research data sharing during pre-FAIRification processes by rewarding and crediting

Conclusion

Notes

Acknowledgements

Competing Interests

Author Contributions

References