Interconnecting Systems Using Machine-Actionable Data Management Plans – Hackathon Report

The common standard for machine-actionable Data Management Plans (DMPs) allows for automatic exchange, integration, and validation of information provided in DMPs. In this paper, we report on the hackathon organised by the Research Data Alliance in which a group of 89 participants from 21 countries worked collaboratively on use cases exploring the utility of the standard in different settings. The work included integration of tools and services, funder templates mapping, and development of new serialisations. This paper summarises the results achieved during the hackathon and provides pointers to further resources.


INTRODUCTION
The Data Management Plan (DMP) was introduced to document and publish both data management practices and policies that are applied to data throughout its lifecycle. This implies describing the techniques, methods and policies on how data is to be created, collected, documented, processed, accessed, preserved, disseminated as well as the roles and responsibilities of associated actors (Michener, 2015).
The premise behind the concept of a machine-actionable DMP (maDMP) is that information contained within a DMP can be enacted both by humans and automated systems, thus addressing some of the limitations associated with traditional DMP documents. To that effect, data management workflows should integrate maDMPs and data management policies should take into account not only human agents but also machines. maDMPs should support both human and machine-processable representations so they act as an interchange format for dissemination and public access of the maDMP (Simms et al., 2017). In order to provide a machine-actionable representation of a maDMP, it becomes necessary to establish a standardised representation of the maDMP. The Research Data Alliance (RDA) 1 DMP Common Standards (DCS) working group (Miksa, Cardoso, and Borbinha, 2018;Miksa, Neish, et al., 2018;Miksa, Walk, and Neish, 2019) developed an application profile making it easier to express information from traditional DMP documents in a machine-actionable way. The DCS maDMP application profile allows for automatic exchange, integration, and validation of information provided in DMP documents. Thus, facilitating the exchange of information between systems acting on behalf of stakeholders involved in the research life cycle, such as researchers, funding bodies, repository managers, ICT providers, librarians, etc.
This paper reports on a hackathon organised by the DCS working group, which had as main motivation to promote the adoption of the maDMP concept by the research community, and, in particular, the usage of the DCS application profile for interchange of maDMPs. To that effect four main areas were identified: (1) serialisation, to encourage community development of serialisations of the DCS application profile; (2) integration of DMP tools, to promote the compliance or usage of the DCS application profile in the existing DMP tools used by the community; (3) further integration, aimed at the integration of the DCS application profile with existing data management tools and workflows; and (4) funder templates mapping, where existing DMP modules and representations were to be aligned with the concepts in the DCS application profile.
By focusing on these four areas, the hackathon aimed at achieving three primary objectives: (1) to broaden the community focused on maDMPs, (2) to expand the support for maDMPs, and (3) to expose the growing endorsement on the adoption of the DCS application profile in a wide range of settings, enabling exchange of DMP specific information in a machine-actionable way. In order to achieve these objectives, participants were encouraged to both submit topics and form hacking teams. After the hackathon activities were finished, participants were asked to report their results, by compiling individual topic reports.
The main aim of this paper is to familiarise readers with the recent developments in the field of maDMPS, by summarising and providing pointers to results, projects, and prototypes developed during the hackathon. The paper provides also context on how the results from the hackathon were produced to allow the readers to better interpret the achieved results in view of constraints, e.g. distributed nature of collaboration, many participants not being originally involved in the recommendation development, limited time to produce results, etc.
The remainder of the paper is organised as follows. Section 2 describes the maDMP hackathon by providing a characterisation of both the organisation of the hackathon and its participants. Section 3 details the submitted topics and provides a short summary of objectives and results per topic. Section 4 reports on the experiences from organising and participating in a virtual online hackathon and importance of maDMPs for Open Science. Finally, Section 5 describes the conclusions drawn from the hackathon as well as points that can be tackled in the future.

THE MADMP HACKATHON, A COMMUNITY PRACTICE
With the idea of broadening the maDMP community, improving the current DCS application profile and promoting its adoption, a hackathon was organised from the 27 th to the 29 th of May 2020. A hackathon is a programming-oriented event where participants gather together to collaboratively work at an intensive pace towards possible solutions to some particular challenges (Briscoe, 2014;Garcia et al., 2020;Stoltzfus et al., 2017). Such solutions are created by interdisciplinary teams including, for instance, domain experts, designers and developers. In additions to open source projects, either at a prototype or production level, hackathons also boost medium to long term collaborations, with hacking teams typically continuing to work on the proposed solutions as well as follow up solutions, past the original timeline of the event (Briscoe, 2014;Garcia et al., 2020). Following RDA common practices, the maDMP Hackathon was announced via mailing lists and advertised via Twitter. The registration was open to everybody from newcomers to experts, with participants being given a period of five weeks to both register and submit topics to be addressed during the hackathon.

HACKATHON ORGANISATION
In order to comply with the open access principles, information regarding topics, teams and activities was made publicly available on GitHub. 2 The hackathon followed a self-organising spirit, with participants being free to propose topics and establish teams, which all participants were open to join. The teams had time to propose topics before the start of the event in a collaborative spreadsheet. The hackathon was opened by the the kick-off meeting in which participants discussed the previously collected topics and had a chance to better understand the scope of work proposed for each of them. Based on that they were able to join selected groups.
During the hacking days, participants used Slack, a communication platform for teams, so people within and across teams could easily discuss their ideas. Every team usually held several video meetings -depending on their preferred working mode. Everyday there was an open video meeting for all teams in which every team reported on their progress and thus everyone had a chance to ask questions and provide feedback.
The hackathon was concluded by the wrap-up meeting to present and discuss results produced by each group. The results of each team are published individually on zenodo and are referenced in this paper. The video with a recording of the "Grand Finale" in which every group discusses the outcomes is also available on GitHub.

PARTICIPANT CHARACTERISATION
In total there were 89 participants that registered to attend the hackathon. Participants were associated with institutions from 21 distinct countries, see Figure 1. In terms of gender, see Figure 2, there were 53.9% male and 46.1% female participants (corresponding to 48 men and 41 women), a 7.8% difference between genders. Concerning participant engagement in teams, as can be seen in Figure 5, 60.7% of registered participants opted to join a team, with only 39.3% of registered participants opting not to join (corresponding to 54 joiners to 35 non joiners). Team members represented institutions from 14 different countries, see Figure 3. In terms of gender, see Figure 4, there were 56.4% male and 43.6% female participants (corresponding to 31 men and 24 women), a 12.8% difference between genders. In regard to community engagement, the 66.3% of registered participants (59 participants) were not members of the DCS working group, whilst 33.7% (30 participants) were registered members of the DCS working group, as seen in Figure 6.

HACKATHON TOPICS
There were a total of 12 topics addressed in this hackathon. Topics were distributed according to the four main areas of application, that were identified based on the overall motivation for the hackathon, (see Table 1). The areas of application were: (1) serialisation; (2) integration of DMP tools; (3) further integration; and (4) funder templates mapping.

SERIALISATION
By definition, the DCS application profile was always intended to have multiple serialisations. In this hackathon, participants were encouraged to submit topics that would tackle the creation of new or upgrade existing serialisations. There was only one topic submitted that aligned with the serialisation area of focus. The objective of the Unaturals team was the creation of a new version of the DMP Common Standard Ontology (DCSO) (Cardoso, Ekaputra, et al., 2020). The DCSO is an existing serialisation of the DCS application profile that is expressed in RDF/XML. During the hackathon the team achieved the following results: (1) reorganisation of the existing DCSO by separating the core elements from complementary concepts such as countries or languages; (2) integration of concepts from third-party ontologies into the DCSO; and (3) establishing a set of basic Shape Expressions (ShEx) 3 constraints for the DCSO, which can potentially be used to check the conformance of DCSO represented DMP documents, with the DCS application profile specification.

INTEGRATION OF DMP TOOLS
The integration of DMP tools area is motivated by the need of promoting compliance and usage of the DCS application profile throughout the DMP tools in use by the research community. Five teams submitted topics that matched this focus area, and overall, these topics tackled 6 DMP tools.
The Data Stewardship Wizard (DSW) 4 is a DMP tool that guides researchers into creating their DMP documents. The objective of the DSW team was to equip the DSW with the ability to use the DCS application profile as an interchange format. This implied having the ability to both import and export maDMPs that were compliant with the DCS application profile. The objective was achieved, with the latest version of the DSW now being able to import JSON maDMPs and export maDMPs both in JSON, DCSO and in a human readable version (Suchánek et al., 2020).  EasyDMP 5 is a DMP creation tool supporting simple and nested question types, organised in a linear structure. On the other hand the DCS application profile has simple and nested data types, in a tree structure. The objective of the Datatypists team was to represent any missing DCS application profile concepts as as question type in EasyDMP, and ponder how to encode the DCS application profile nested tree structures, such as the dataset type. The result of this topic was a revised design of the EasyDMP approach to represent the concepts, further detailed in the topic results report (Moa, Hasan, and Philipson, 2020).
The DMP Exchange included developers from the following DMP tools: DSW, EasyDMP, DMPOnline, DMPtool and Haplo. 6,7,8 The objective of the team was to determine if the DCS application profile serialised in JSON could be used as an interchange format in various DMP tools, to do so, mappings across their internal DMP models were established, and the tools were equipped with the ability to both import and export maDMPs. Team members were able to update or create a mechanisms that allowed maDMPs to be exported by the participating DMP tools. However the objective of allowing for the import of maDMPs by the DMP tools was not totally achieved, as that requirement was only fulfilled by one of the tools (i.e. Data Stewardship Wizard). An extended description of results of this topic are presented in its results report (Faure et al., 2020).
The Research Data Management Organiser (RDMO) 9 is an open source tool that helps institutions and researchers with planning and executing data management activities. It uses an internal vocabulary called "domain" to map and abstract the user's input into questionnaires together with any other relevant information. The objective of the RDMO team was to map RDMO's domain to the DCS application profile and then build an export functionality. The team was able to map most of the RDMO's domain to the DCS application profile, and establish a prototype of the export functionality (Klar et al., 2020).
The OpenDMP software 10 is a data management planning tool that was created by a consortium comprising OpenAIRE 11 and EUDAT CDI. 12 The OpenDMP tool has been implemented by two DMP tools, Argos 13 and EasyDMP. The team focused on Argos, and had two objectives: (1) to implement an import and export function that would allow the usage of DCS application profile compliant maDMPs as an interchange format; and (2) to establish mappings between OpenAIRE Research Graph model (Manghi et al., 2019) and the DCS application profile. A description of results of this topic can be found in its results report (Tziotzios et al., 2020).

FURTHER INTEGRATION
The further integration area was intended to cover topics whose objective was to integrate the DCS application profile into existing data management frameworks and workflows. Four teams submitted topics that matched this focus area.
The objective of the maDMP Link team was to enable both DMP Roadmap 14 and Figshare 15 to use the DCS application profile as an interchange format for maDMPs. The team was able to develop a prototype for Figshare, that allowed to both import and export maDMPs serialised in JSON (Zimmer et al., 2020).
The InsTmaDMP team aimed at analysing research data management (RDM) worfklows from four distinct research institutions. They opted to focus their analysis on the DMP creation processes of these workflows. The objective was to analyse the resulting DMP documents and identify any changes that would be necessary (i.e. addition, removal or editing of DMP concepts) to create DCS application profile compatible maDMPs. The result was a list of recommendations for changes in the analysed RDM workflows. A full detailed description of the results of this topic can be found in its report report (Karimova et al., 2020).
The Something team set themselves to analyse the data management workflows from the Climate Community in the EOSC-Nordic project, 16 and establish mappings between DCS application profile and existing data management concepts. The objective was to be able to represent the data management workflows with DCS application profile compliant maDMPs. The team was unable to completely map the analysed workflows to the DCS application profile, and as such they considered creating an extension to the DCS application profile. The full results are available in the team's results report (Hasan, Fouilloux, and Jacquemot, 2020).
The objective of the DMP InvenioRDM team was to establish mappings between the InvenioRDM 17 data model and the DCS application profile. The team was able to map the DCS application concepts in the InvenioRDM data model, and developed a prototype that allowed for maDMP serialisations to be imported into InvenioRDM. The team results can be found in the results report (Wali et al., 2020).

FUNDER TEMPLATES MAPPING
The funding templates mapping area covered topics that aimed at establishing mappings between the DCS application profile and DMP representation models from institutions, funding bodies or other stakeholders in the community. There were a total of two teams that submitted topics matching this focus area.
The objective of the Tigtag team was to establish mappings between the DCS application profile to several of the DMP templates most commonly used by funding bodies. As DMP templates are typically a list of questions, the proceeded to analyse each individual question and map them to one or multiple matching fields in the DCS application profile. The result of this process was the proposal of an funder-extension to the DCS application profile that added a set of concepts that are required to completely map all of the analysed DMP templates, but are not present in the DCS application profile. The results are further detailed in the results report (Cardoso, Jones, et al., 2020).
The Fancycatmeme team had the objective of automating quality-control metrics in the Linked Data Pipeline 18 for the Research Data Connectome. 19 To accomplish this objective, the team first analysed a series of existing DMP documents and proceeded to create DCS application profile compliant maDMPs. Secondly, they provided a series or recommendations on how to approach the path to maDMPs for research institutions, funding bodies and other stakeholders in the ecosystem. The team results can be found in the results report (Rettig et al., 2020).

DISCUSSION: BEYOND MADMP
Our discussion revolves around two main topics: experiences from organising and participating in a virtual online hackathon and importance of maDMPs for Open Science.

A VIRTUAL HACKATHON, LESSONS LEARNT
Hackathons, aka CodeFests or Programming Sprints, are becoming more and more common within the scientific community, varying from a couple of days to a whole week. Depending on the length and number of people, logistics might differ. However, there are some common elements such as: defining goals, ensure a balanced and representative participation, use effective communication channels, promote a respectful environment, encourage teambuilding and self-organisation and include retrospective sessions . Given the high degree of interaction, hackathons are commonly organised in face-to-face fashion. 9 Cardoso et al. Data Science Journal DOI: 10.5334/dsj-2021-035 However, due to the COVID-19 pandemic in 2020, the maDMP hackathon, as most of the scientific events running at that time, was organised as a virtual online meeting.
One of the main challenges was keeping a high level of participant interaction. Organisers lead the common activities, namely pitching and wrap-up meetings, and made sure these were announced in advance by sending constant reminders and posting announcements in Slack. Repeating information multiple times to ensure that it reaches the target audience is a common practice, particularly in virtual events. Emoticons, animated GIFs and other eyecatching elements play an important role in virtual communication, as they convey emotions beyond the text. Organisers relied on team leaders for internal communication with participants joining their effort. Online meeting rooms and Slack were continuously used. As this was a twoday hackathon, keeping focus was not a main issue, participants were committed to (mostly) clear their agendas for two days so the could actively participate on their selected projects.
Finally, a post-hackathon survey was shared with participants. In total 13 participants provided answers to the survey, providing an overall positive feedback. It is relevant to highlight the participant responses on four of the survey questions: (1) On the overall assessment of the event, the majority (69.23%, 9 participants) of survey responses rated the event as being either "very good"; (2) On its organisation, the majority (61.54%, 8 participants) of survey responses rated the event's organisation as "very organised"; (3) On its length, the majority (76.92%, 10 participants) of survey responses rated the event's duration as "about right"; and (4) on the likelihood of participating in similar future events, the majority (61.54%, 7 participants) of survey responses described the likelihood of participating in a similar event as "extremely likely".

MADMPS AND OPEN SCIENCE
Open Science advocates for open access to research objects (e.g. data, software, workflows, DMPs); however, open access on its own is not enough to ensure reproducibility and advance science, two of the aims of Open Science (Ali-Khan et al., 2018). Effective open data access requires researchers producing data to accompany their data with a DMP as it covers the whole data cycle, from collection to archival, and includes guidance on how to use this data in the future. As it happens with any other research object, DMPs should include enough metadata for them to support the FAIR principles (Wilkinson et al., 2016). Given the amount of steps covered by DMPs and the metadata involved (e.g. techniques, methods, policies), maDMPs become as a natural evolution for DMPs. maDMPs make it easier to continuously and systematically monitor a DMP from start to end. Furthermore, the DCS application profile provides information for researchers to include the necessary metadata to make maDMPs FAIR. Whenever a funding agency demands an update on a project it is funding, maDMPs enable researchers to quickly produce a view up-to-date, with less manual effort that it would be needed on traditional DMPs. A limitation here is the variety of templates used by funding agencies, a topic that was tackled by two of the hackathon participant teams. These efforts can be extended to other funding agencies across different research domains, making maDMPs an ideal companion offering a better support to FAIR principles and Open Science.

CONCLUSIONS AND FUTURE WORK
The DCS working group entered the maintenance mode. It periodically reviews the recommendation and makes new releases based on the community feedback collected, only if needed. The group, being an active community of maDMP users, helps in promoting the adoption of its recommendations. In this particular case, the adoption of the DCS application profile, and its serialisations as an interchange format for maDMPs. The maDMP Hackathon was therefore an important event to carry out this overarching goal. The hackathon was expected to achieve three primary objectives: 1. To grow the maDMP community. The maDMP hackathon had 89 registered participants of which, 59 were not previously associated with the DCS working group (see Section 2.2). Considering this, it is possible to state that the maDMP Hackathon succeeded in helping the maDMP community to grow, as its expected that participants will continue to engage with the community in the future. Cardoso et al. Data Science Journal DOI: 10.5334/dsj-2021-035 2. To increase the support for maDMPs. The topics addressed in Sections 3.2 and 3.3 showcase the work developed by participants in pursuit of this objective. Tools and solutions still need time to mature. However, in the context of this hackathon the prototypes and mappings that were created, are a step towards achieving this objective. Furthermore, in case of Argos and EOSC Nordic teams, the work they started during the hackathon continued afterwards and resulted in adoption of maDMPs described in adoption stories 20,21 submitted to RDA. Same applies to the ontological representation of the DCS that was published in proceedings of a peer-reviewed workshop (Cardoso, Garcia Castro, et al., 2020).

3.
To provide exposure to the adoption of the DCS application profile as a means to exchange DMP information in a machine-actionable way, in multiple of contexts. This objective posed a challenge that was larger than what was attainable in the scope of this hackathon. This paper, and other papers reporting on the topics addressed in this hackathon, can be a contribution to the attainment of this objective. However due to its nature, the impact of the efforts towards the completion of this objective can only be measured in the long term.
Overall the hackathon can be considered a successful event. Participants provided results pertaining to all of the identified objectives. It is important to point out that the objectives can all be considered open problems, where there is always the potential to strive for better results. As such, the RDA DCS working group will continue to support its adoption by the community. Participants in the post-hackathon feedback have already expressed their willingness to participate in future hackathons or similar events that would promote the integration of the DCS application profile with of other DMP tools or systems.