Introduction

Research data can have enduring value, as long as scientists can use, reuse and combine data sets (). Effective ‘data sharing’ requires that the conditions for access and reuse are clear. In the case of ‘open data’, there are no limitations. Data that is not open can still be made accessible, as long as the procedures for obtaining it are defined (). The need to formalise the conditions that govern data collection, storage and reuse is one of the motivations for the establishment of data management plans (DMPs). These serve as framework documents in research projects () and are necessary tools for data reuse across various scientific and engineering fields. DMPs are now required for many grant applications, including those of various international and national funding bodies, such as the European Commission (see Horizon Europe Data Management Plan Template []) or the French National Research Agency [].

There are ten essential items in any research DMP (): (i) funders’ requirements, (ii) data to be collected, (iii) data organisation, (iv) data documentation, (v) data quality assurance, (vi) data storage and preservation strategy, (vii) project’s data policies (licensing, ethical considerations, etc.), (viii) data dissemination, (ix) team members’ roles and responsibilities, and (x) budgetary aspects. While usually relatively straightforward for single investigator-driven projects, DMPs are more difficult to define and maintain in the case of international, multidisciplinary and/or multi-partner research projects (e.g. ). Moreover, an effective DMP must not become obsolete after a project is completed (), and needs to address the implementation of Findable, Accessible, Interoperable, Reusable () data principles (). Here, we describe the steps taken to establish a DMP for ‘Integrated Services for Infectious Disease Outbreak Research’ (), a large consortium constituted in response to the SARS-CoV-2 pandemic (), with the support of its sister European Open Science Cloud () project BeYond-COVID ().

Challenges for the ISIDORe Project and Its Data

Introduction to ISIDORe

ISIDORe is a three-year project coordinated by the European Research Infrastructure on Highly Pathogenic Agents (ERINHA), funded in 2022 through the Horizon Europe Research and Innovation Programme. It aims to support researchers through the provision of free of charge access to services, to facilitate rapid research response to infectious disease threats, and improve the EU’s readiness to epidemic-prone pathogens.

ISIDORe brings together 17 major European life-science research infrastructures (RIs) and infectious disease networks, in a multidisciplinary consortium of 154 partners across 32 countries. Collectively, the consortium offers access to an integrated portfolio of cutting-edge resources and research services covering fundamental research, such as structural biology and bio-imaging, as well as diagnostic, therapeutic and vaccine preclinical development, clinical research, epidemiology, social sciences and regulatory matters.

Diversity of Partners and Services

As an international project, ISIDORe’s DMP needs to align with different federal and European requirements. While some of the RIs participating in ISIDORe had already collaborated in the frame of RI cluster projects, such as , and , the ISIDORe project represents a unique challenge, due to the diversity of participating RIs and networks, in terms of their respective sizes, expertise, legal structures and/or working practices. This required a new approach to coordination and collaboration among the different partners. It also made the elaboration of plans for data management, to favour the fast generation of FAIR policies for data sharing, reuse and interoperability, and open access to existing methods and software, such as online tools, workflows and registries, particularly challenging. Data standardisation and accessibility can, however, be maximised by adherence to established principles in data production and management processes, principally FAIR, CARE () and TRUST (), as well as by the implementation of requirements and recommendations from international fora (e.g. , Research Data Alliance [], World Data System [], or the ). Furthermore, to avoid the duplication of effort and a lack of standardisation, it is critical to identify and adapt pre-existing online tools, community-adopted standards and processes, and implement any relevant complementary ones, especially from other multidimensional data science networks. The diversity of partners and services included in ISIDORe requires its DMP to be broadly interpretable, while preserving the necessary individual specificities, so as not to become meaningless, by preserving two aspects: the general principles that will be used to approach data management, and the specifics for services and use cases.

Diversity of Partner Maturity

In addition to complying with the various laws, ISIDORe’s DMP must also be compatible with the consortium partners’ diverse data handling procedures. While some, especially in Life Science (LS)-RI communities with extensive experience in data handling (e.g. structural biology, ; high-content screening, ; biological and biomedical imaging, ) have a solid set of internal procedures, implemented in close exchange with the relevant repositories (e.g. , , , ), others still rely primarily on the diligence of individual researchers. The challenge is therefore to accommodate this high heterogeneity. It requires first determining precisely the current state-of-the-art in data management across the RIs and networks, to reconcile differences and avoid duplication of the efforts already in place. The aim is to create an overarching core DMP that encompasses all existing processes and defines minimal procedures for those partners that do not currently have a fully developed DMP.

Dealing with Sensitive Data

A further challenge is linked to the type of data that is generated within the ISIDORe project. Certain partners produce data with intrinsic biosecurity and/or biosafety issues. For others, there are concerns regarding the potential of identifying individuals from clinical or social science data. There is therefore a need to ensure that any approach to sensitive data FAIRification is legally and ethically compliant and ensures confidentiality and privacy.

The ability to conduct meta-analyses of data from cross-disciplinary studies is especially important in emergency situations, such as in a pandemic. A well-designed DMP that also ensures linking and accessing information on already existing data and relevant research findings (such as on pathogens of concern) will provide a roadmap not only on how to handle the data, but also on the establishment of processes to deal with a mix of sensitive and non-sensitive metadata and assess potential risks for data misuse.

ISIDORe Project DMP Building Method

A Survey to Define the Landscape

To overcome the challenges described above, and conceive a robust DMP, ISIDORe’s strategy was to build on existing components. The participating RIs and networks were therefore surveyed about their data management tools, and those of the researchers who are their end users. One important finding of this survey was that even for users of the more data-mature RIs, DMP tools are often not integrated into their project workflows. Therefore, the ISIDORe DMP must indicate i) where researchers can find guidance for the stewardship and preservation of both data and software, ii) where and how DMP tools and content can be easily and efficiently integrated. For the latter, repository selection, licensing, formats and metadata recommendations, including use of identifiers for building linkages to external systems that provide data interoperability, as defined by the FAIR core principles, must be listed and updated during the project. Lastly, roles, responsibilities and the periodicity for updates must be clearly established.

Alignment with Existing Recommendations

ISIDORe implements the data policy description workflow that was created by the Digital Curation Centre (on behalf of the FAIRsFAIR project) and FAIRsharing (). It is designed to help with the creation of FAIR-aligned data policies , adhering to FAIRsharing policy record metadata, the FAIRsFAIR FAIR Data Policy Checklist () and RDA () specifications. In addition, all checklist fields and RDA-endorsed policy features within its scope are available within FAIRsharing policy records, making them accessible to both humans and machines. This creates a FAIR data policy ‘workflow’ from i) FAIR Data Policy Checklist to ii) deposition of the policy and assignment of a digital object identifier (DOI), to iii) submission of that policy into FAIRsharing and/or other appropriate repositories. Following this workflow will ensure the FAIRification of ISIDORe data management policies.

Mapping Requirements for the DMP

The ISIDORe DMP will make use of pre-existing data repository concepts and integrate available data management tools and guidelines as much as possible (Figure 1), including RDMKit, the FAIR Cookbook (), the Data Stewardship Wizard () (), the ARGOS () DMP service, and the Infectious Diseases Toolkit (). Despite this relatively esoteric landscape, the DMP should be easy to understand by the different ISIDORe partners, so that they can apply its guidelines properly. The toolbox developed for discovery of digital objects related to sensitive data, harmonised across six life-science infrastructures and developed using an iterative process, will serve as an example of how to deal with issues arising across research infrastructures ().

Figure 1 

DMP building scheme for the ISIDORe project.

The ISIDORe DMP is being developed by integrating available DMP guidelines, resources and advice provided by participating RI communities, partner EU projects (such as those shown) and the overall recommendations by the RDA and EOSC. The established DMP procedures, FAIR guidelines, data repositories and metadata will be collected through platforms and tools (such as those illustrated). A live representation of the standards and databases used within the DMP will be available as a FAIRsharing collection. While many LS-RI partners already have DMP workflows in place, the goal and challenge of the ISIDORe project will be to set up cross-disciplinary pipelines, onboarding additional communities and adapting DMP procedures to meet the needs of infectious diseases research (data and tools) for the different disciplines in the consortium.

An investigation of data sharing statements (DSS), as included within the trial registry entries of COVID-19 related studies, revealed strong differences in how the request for details of data sharing has been interpreted, and, when data sharing is described as possible, a huge heterogeneity in the specification of the access procedures (). This indicates that the instructions for the inclusion of DSSs within trial registry data need to be clearer, and where possible, statements should only use standardised categories to indicate the type of access that will be available.

To organise this homogenisation, ISIDORe MetaData mobilisation is based on an access management system (AMS). The main purpose of the AMS is to administer responses to calls, and requests for services, and to collect feedback from users in a standardised and centralised manner. This AMS, currently in development, will allow the management of the catalogue of services, project submission information with its associated metadata, and create a community-validated registry of projects (both those submitted and accepted). For certain RIs, any data generated by the provision of a service to a user is channelled into existing registries and repositories. This will be indexed in the ISIDORe AMS and syndicated through metadata tools and registries (described in the Supplementary Material). Metadata mobilisation will be achieved through existing RI AMSs when they exist (Figure 1); associated syndication processes are described in the ISIDORe DMP.

Since the project’s contributors are numerous and diverse, the DMP should be sufficiently flexible to fit these broad needs, whilst retaining minimal common requirements in standards, protocols and rules for metadata, traceability and provenance of data. This work on the ISIDORe DMP may introduce elements that could significantly impact the DMPs of individual project’s members, making them more general and potentially more ambitious.

FAIR-compliant DMPs covering service management and delivery have, up until now, generally been designed and implemented at the level of the RIs and networks. This type of DMP is distinct from the DMP covering the production of scientific data (Figure 2). For the latter, the onus to support users in DMP compliance has previously generally fallen on the individual service providers (e.g. national or regional core facilities). ISIDORe provides an opportunity to explore ways to harmonise user support. In this, as for all data challenges, ISIDORe is supported by BY-COVID. The BY-COVID project aims to make data and analysis tools related to infectious disease outbreaks more interoperable and accessible, not only to scientists but also to medical staff in hospitals, and policy makers, such as government officials, or indeed any other potential user.

Figure 2 

The two sides of ISIDORe’s DMP approach.

ISIDORe requires a data management plan (DMP) covering both access management systems (AMSs) and scientific data generated through ISIDORe user projects. The scientific data part integrates available LS RI DMP components and adapts them for all disciplines. The AMS part harmonises management metadata from RIs and networks with the support of BY-COVID and in line with EOSC/RDA recommendations. Both approaches are applicable over a broad range of disciplines and must be compatible with the constraints and existing practices of ISIDORe’s diverse partners.

The two projects are working together: BY-COVID provides support, tools, data discovery and dissemination strategies, to enable data from user projects implemented by ISIDORe to be as FAIR as possible, while ISIDORe promotes the adoption of FAIR (meta)data standards by partners, via its network of data stewards, and tries to ensure that data is made available in the public domain, as widely as possible, preferably via dedicated sites such as the . As digital data needs to be actively managed over time to ensure continued availability and useability, depositing data resources with a trusted digital archive helps ensure that they are curated and handled according to good practices in digital preservation (). Through the COVID-19 data portal, BY-COVID provides a platform for data discovery across a broad range of disciplines, with the flexibility to add any relevant data resource proposed by ISIDORe, through a lightweight metadata format jointly developed by both projects (). The collaboration between the two projects will lay the groundwork for Europe’s future pandemic preparedness with FAIR data pipelines, enabling rapid response to outbreak situations, and a broad framework for answering research questions across research disciplines and country borders.

BY-COVID also provides standardised data management and analysis methods (partially based on the Galaxy Project), protocols, and training, to ensure FAIR is an integral part of the data pipeline. Building on existing infrastructure, data repositories are explorable through a single platform (COVID-19 data platform, and in the future, the EBI pathogen portal) allowing users to deposit, access and analyse pathogen data more easily. BY-COVID’s data source catalogue of choice is FAIRsharing. It describes relationships among the data sources (databases, knowledge bases, repositories) and assists with the selection, documentation and visualisation of standards. The use of RO-crate (), based on FAIR Digital Object (FDO) principles, guarantees efficient data sharing, including data and metadata compliance using community-adopted standards such as bioschemas and common workflow language. Workflow repositories such as WorkflowHub that include the use of persistent and unique identifiers (e.g. DOIs []) ensure findability and discoverability of computational workflows and pipelines. Data discovery is further enabled by the BY-COVID data discovery network, based on the principles of the Beacon Network and employing an Application Programming Interface (API) that allows, for instance, discovery of genomic, phenotypic and clinical data.

Building a Bridge Between Partners’ Data

The implementation within ISIDORe of most user projects will generate experimental data. Ensuring that researchers adopt common DMP principles to deal with their data will require provision of advice at an early stage of their projects, and guidance on FAIR principles and the available tools and resources that fit their projects’ data needs, as well as open science requirements. The current DMP strategy in Euro-BioImaging, for example, offers incoming users a consultation with the RI’s FAIR Data steward before data is acquired. On the one hand, this raises awareness of the available guidelines, repositories and tools, and on the other hand, it brings data generation onto the DMP track. It also fosters timely adoption of the DMP with regards data provenance and life cycle management requirements, as well as readiness for data integration into open science frameworks after data generation and analysis are completed. It will be important for ISIDORe to establish cross-consortium exchanges between individual FAIR data stewards from the different RIs, to support the harmonisation of DMP principles and procedures.

Already, one solution to the problem of a comprehensive, broadly applicable yet specific DMP has been identified. The DMP can be divided into sections, including a general, overarching one, describing overall data management principles (so called ‘core DMP’), and RI/field-specific sections. This may be achieved through several curated sets of questions (knowledge models) created in the Data Stewardship Wizard () and pre-filtered by the different ISIDORe partners, depending on their domain and how the data covered by a specific knowledge model is to be disseminated. The generation of RI-specific resources and repositories will greatly benefit from existing data-related or domain-specific collections (such as those collected through FAIRsharing, ), and through the existing collaboration of RIs with data and workflow repositories. Indeed, in certain cases, these may have already paved the way for easy and fast submission of data.

Projects that require the provision of services by two or more RIs, and whose datasets must therefore comply with the requirements and recommendations of those RIs, present a special challenge. Again, the introduction of a two-tier DMP will help to overcome this, although the participating RIs will still need to reach a consensus on the recommendations to be given to users regarding their specific data constraints. One can imagine that this will also be facilitated by the variability of knowledge models within the DSW. Ideally, the RI or network (preferably via its data steward) should be automatically alerted of possible conflicting recommendations or overlapping issues, and thus ISIDORe will be able to provide users with access to coherent and curated catalogues of data, resources and repositories.

Building alignment between consortium partners on the core DMP principles and tools, as well as on the implementation of available resources to develop harmonised DMP workflows, can be fostered by consolidating a consortium-wide data steward team/committee. Data steward representatives from consortium RIs and networks are to be invited to participate in shaping and curating ISIDORe’s DMP, in collaboration with partner projects and initiatives such as those developed through , and , and taking into account the procedural details of their respective DMPs (; ). This will allow the creation of sustainable exchange and cross-community validated solutions on the RI landscape.

Sensitive Data within ISIDORe DMP

Based on the experience of the CORBEL project, with its multi-stakeholder task force, sensitive-data experts previously examined major issues associated with the sharing of sensitive data from individuals. They developed a consensus document on providing access to such individual participant data, using a broad interdisciplinary approach. This included 10 principles and 50 recommendations (), representing the fundamental requirements of any framework used for the sharing of clinical research data. The document, to be included in ISIDORe’s DMP, covers the following main areas: making data sharing a reality (e.g. cultural change, academic incentives, funding), consent for data sharing, protection of trial participants (e.g. de-identification), data standards, rights, types, and management of access (e.g. data request and access models), data management and repositories, discoverability, and metadata. Its adoption helps promote and support data sharing and reuse among researchers, adequately inform trial participants, protect their rights, and provides effective and efficient systems for preparing, storing and accessing data.

Access Management Systems as a Glue

The main objective of the ISIDORe access management system (AMS) is to administer responses to calls for proposals and requests for free access to the project’s services, as well as to collect feedback from users in a standardised and centralised manner. Some of the ISIDORe partners already have a pre-existing AMS, key to the facilitation of good data management in access provision within ISIDORe. Therefore, ISIDORe has conceived a central ISIDORe AMS system as a decoupled ‘headless’ component that will feed and be fed by the partners’ pre-existing AMSs. For that purpose, minimal interoperability elements with the central ISIDORe AMS have been defined which the partners’ AMSs will have to implement in order to provide the central system with up-to-date, formatted content regarding the details of the services proposed and the status of the requests for services that they process.

For partners that do not have a pre-existing AMS, or cannot ensure the transmission of the set of minimally-defined data, a new basic AMS system that matches the minimal interoperability criteria with the central AMS will be implemented. This could either be developed from scratch or be based on elements from one of the two pre-existing systems in use by partner RIs, , and the . , , MIRRI, EMBRC and Instruct, use the ARIA AMS. They also use it for managing user requests submitted in response to ISIDORe’s calls for proposals. ARIA can handle many aspects of RI operations, from proposal submission, and the evaluation process, to keeping a record of users’ on-site visits and their feedback. Such a system enables all application and access provision data associated with a proposal to be collected and findable, and is compliant with the General Data Protection Regulation (). Additional developments are planned for ARIA to allow for the output of an executed project, such as publications and datasets, to also be findable via the AMS. This would be a powerful tool to allow data traceability. EVA-GLOBAL’s AMS is currently based on the Drupal open-source content management system (CMS) and also implements GDPR compliance. Similar to ARIA, the EVA portal manages many aspects of RI operations and service provision. Currently used by the EVA and Infravec networks, it has the advantage of being user-friendly and has demonstrated great adaptability in the context of rapid responses to epidemic and pandemic situations (including the Zika epidemic, COVID-19 pandemic and most recently, the 2022 Mpox outbreak). This flexibility would be a great asset, especially considering ISIDORe’s expected role in rapid research response.

Discussion

Exploiting the ISIDORe Framework

The ISIDORe project is a unique opportunity for the 17 participating RIs and networks to work together to establish a new approach for service provision, to improve pandemic preparedness and accelerate research in response to an unexpected outbreak. The project needs to reconcile the partners’ disparate philosophies, processes and practices, as well as ensuring that they all embrace common contemporary standards, while implementing best practices. The task is made more challenging, but technically more feasible, since ISIDORe has been designed to collaborate closely with its sibling data-focused project, BY-COVID. A necessary part is the constitution of a shared DMP. Researchers report that the benefits from a well-prepared DMP outweigh the initial investment needed to implement rigorous data management practices (). Our overarching aim is thus to establish a working consensual umbrella DMP for this broad, multifaceted and complex community that will be useful for both the ISIDORe members, and the researchers who profit from the consortium’s services.

The Iterative Processing of the DMP

Our goal here was to describe the joint efforts of ISIDORe and BY-COVID towards the establishment of a complete and harmonised DMP, through a living document, allowing contributions from all stakeholders, prior to finalisation of a definitive and functional version.

As DMPs require upgrading in an iterative fashion by all partners involved in their creation, the use of a web-based DMP tool that allows users to contribute in parallel, manages versioning, and provides guidance in building the document, is paramount during the complete maturation cycle of the project’s DMP. In principle ISIDORe will use , rather than any of the other available tools (e.g. DSW, , ).

How to Exploit Pre-Existing Efforts

The ISIDORe DMP has to deal with two facets of the project, the RI/network-centric access provision, and the service-provider level data generation. Thus, it cannot simply be a merging of institutional-level DMPs, but it needs to be designed through a dynamic agreement among all partners, based on the best current protocols to manage and exchange data in the respective fields. The partners with more advanced best practices act as drivers, laying the groundwork to define the content of the DMP, while the active contribution of all the other partners ensures that all needs and perspectives are appropriately considered and integrated in the set-up of a general proof of concept, with successive refinements and adaptations to the respective specificities and requirements.

Stakeholders involved need to develop equitable and sustainable financial models for data sharing, to ensure the long-term resourcing of data preparation and storage, as well as the request and sharing process. These tasks are not easily predictable, and thus not easily linked to initial funding. The discussion regarding sustainable business models for data infrastructures is ongoing, and it is difficult to identify a preferred model yet. A particular problem is that, while many established national and international data repositories have core streams of income from research funders, these sources of income are usually short-lived and may be vulnerable to change in priorities or in responsibilities ().

Diversity and Specificities Integration

To integrate FAIR, CARE and TRUST principles, ISIDORe necessarily adapted basic recommendations to this interdisciplinary, international project. It additionally needed to take into consideration the fact that the individual partners in the project could have conflicting priorities or may have had their own prior implementation of FAIR data sharing. This requires an inclusive harmonisation for data management practices, as a basis for future cross-disciplinary meta-analyses.

The adoption of the FAIR principles has encouraged researchers to comply with data and metadata standards. This need for ‘FAIRification’ has spurred several international efforts (cf. description in Supplementary Material), and members of both the ISIDORe and the BY-COVID consortia are either directly involved in these efforts or engaged in the exploitation of these approaches in the context of their respective projects. One aim of these efforts is to facilitate interoperability between disparate data types. The interoperability principle in FAIRification can only come into play when the semantics of the content are well-defined across heterogeneous data sources (). Ontologies are one of the semantic tools that are frequently used to support interoperability (). The ISIDORe project, therefore, might benefit from the development of an ontology project as a guide to the elaboration of the DMP, especially for the section concerning the implementation of FAIR principles and open data (distinct from the OBO academy’s domain ontologies). Such an ontology project would aim at clarifying the relationships between the entities (in the ontological sense) and metadata, as well as the selection of data standards and data types. Reusing project partners’ existing metadata and data standards might simply be done by merging the information from each partner infrastructure in ISIDORe and agreeing on the most consistent one in case of equivalent candidates. A better approach would, however, be to evaluate for each data/metadata standard, if a ‘linked data’, ‘linked open data’ or even better, ‘FAIR linked open (meta)data’ standard exists that could be used (). Using such linked (meta)data standards would contribute to the semantic web and thus highly increase the potential findability of ISIDORe data, as well as its capacity to be interconnected. It would also give ISIDORe data additional meaning and boost the potential of its syntactic power in data mining processes. Wikidata entities and their associated metadata could be good FAIR-linked open metadata candidates for ISIDORe data, as they are an essential part of the current semantic web, validated and maintained by a large community. They could be expanded by the ISIDORe community with consistent novel inputs (). Thus, in an ideal world, by using linked data as metadata, we might enable ISIDORe data to be linked to the semantic web.

Further Steps and Perspectives

The ISIDORe project has a mission to define and implement best practices in collaborative data management, with a fundamental objective to promote their adoption by the community. ISIDORe has the potential to lead the way with innovative aspects related to data controllers, data processors, and future proofing data sets (both data type and format) regarding storage, processing and transfer. Additional aspects to be considered will include data security, GDPR compliance, quality control, data anonymization, analysis and traceability of data, as well as communication and knowledge management. Indeed, the heart of the work will consist of data mobilisation planning, to convince all partners of the importance of replicating methods for which there is already an established implementation, or at least a proof of concept, all the while taking into account the human dimension (FAIR literacy [] and the availability of qualified personnel for support, training and the assessment of results []). The quality criteria that are identified at each step will have to be constantly revisited. In the end, dissemination, support for DMP application, training and (self-)assessment must be FAIR and re-used, and could potentially feed into RDA recommendations. To conclude, while many aspects of the ideal DMP laid out here are achievable with current resources, an ultimate challenge for the ISIDORe community could be the implementation of machine actionable DMPs () that adhere to all DMP Common Standards (). But for that to happen, as for most research projects, FAIR DMP Literacy provided by a well organised network of data stewards must first be improved. This should become more feasible once the Competence Centres already active in some fields become more generalised, including in the planned EOSC cluster project roadmap.

Data Accessibility Statements

CC-By licence – Data contact: RD. Survey & survey data will be made available after anonymisation with the ISIDORe DMP deliverable.

Preprint for Zenodo: https://doi.org/10.5281/zenodo.7516084https://docs.google.com/document/d/1AfLuhEFZl7suqWXbRvgyF_dSpmfhzrQ676mbTx-pLjI/edit?usp=sharing.

PREPRINT submitted: 02/06/2023 in Data Science Journal (DSJ) Special issue title: Data Management Planning across Disciplines and Infrastructures.

Additional File

The additional file for this article can be found as follows:

Supplementary material

Data Management Tools and definitions. DOI: https://doi.org/10.5334/dsj-2023-035.s1