Advancing FAIR Agricultural Data: The AgReFed FAIR Assessment Tool

Christiane Bahlo

Introduction

Agricultural data is generated from a variety of sources in increasing volumes () and cost (). There is a recognised need for data sharing between researchers and the broader community (), but data is often locked up in inaccessible formats and locations, lacking metadata, licence information, attributions and provenance information. This inhibits data discovery, reuse, and federation, which hinders potential new discoveries and innovations (; ; ).

These issues exist across multiple domains, and the FAIR Principles have emerged as an overarching guide to help make data more findable, accessible, interoperable and reusable (). The FAIR principles are intended as a broad guide that is domain-agnostic, but interpretation and implementation, including the development of specific metrics to measure FAIRness, require further effort (), which includes community involvement (; ) and may vary between domains (; ).

FAIR implementations seek to leverage investments in digital objects and foster cross-domain data federation to facilitate new discoveries by, for example, illuminating some of the climatic, environmental, and societal linkages with agriculture (). Digital objects include datasets and associated metadata (; ), as well as models and algorithms (; ), vocabularies and ontologies (; ) and software (; ; ).

Specific FAIR implementations were first developed in the life sciences () and then in other domains (). More recently, agriculture-specific implementations have begun emerging (). To support different domains and data communities, the WorldFAIR Project () is developing FAIR Implementation Profiles (FIP) (), but to date no agriculture FIP exists. The Agricultural Research Federation (AgReFed) is an Australian cooperative of communities of data providers, whose technical council developed a set of FAIR thresholds for data and services (). Existing FAIR assessment tools (both manual and automatic) use numeric scoring systems or star ratings, which are unable to express the minimum thresholds adopted by AgReFed.

Background

AgReFed was initiated in 2018 with funding from the Australian Research Data Commons (ARDC) to address challenges with FAIR data in agriculture. During the enactment phase, a community of agricultural research data users and providers was formed, and the governance and stewardship model was developed ().

AgReFed’s common vision is to ‘enable Findable, Accessible, Interoperable and Reusable (FAIR) agricultural data to accelerate innovation and increase the profitability and sustainability of Australian agriculture’ (). The AgReFed community developed a domain-specific FAIR assessment that defines thresholds based on minimum standards, acceptable standards and stretch goals (). AgReFed recognises that FAIR is a journey that is supported through education and tooling (; ) and that FAIR assessments are among the tools and technologies that assist with the provision of FAIR data (; ).

AgReFed provides infrastructure, resources, and tools as well as a framework for governance and data stewardship, which are accessible via AgReFed’s website. AgReFed, being a data federation, requires data and access controls to be maintained by data providers, ideally within trusted repositories (). Data providers are also responsible for the creation and maintenance of adequate metadata and for attaining and sustaining an acceptable level of FAIRness (). FAIR assessments should be conducted on all agricultural research datasets, or at least on exemplars of similar collections. In reflecting on the development of AgReFed’s FAIR thresholds, Wong et al. () suggest an automated assessment tool as a next step.

Education, minimum metadata standards and appropriate organisational policies will improve the level of FAIR data. Although this is expected to lessen the need for FAIR assessments over time, they will continue to be useful to data providers in meeting FAIR requirements in the agricultural science community.

Motivation for Software

Given that FAIR implementations are domain-specific () and that assessment tools support FAIR implementations (), it can be argued that the agricultural research domain has a need for a suitable FAIR assessment tool. Where existing tools don’t provide suitable solutions, a community can overcome this implementation challenge by creating new tools, which in turn contribute to the FAIR ecosystem (). As none of the existing FAIR assessment tools were able to provide the functionality required to support AgReFed’s goals, the creation of a new tool was indicated in conjunction with reusing the existing F-UJI Application Programming Interface (API) to test the machine readability of digital resources. The new FAIR assessment tool was released as open source, being a reusable contribution to the broader FAIR ecosystem, and an effort was made to make the assessment tool itself FAIR, all of which are in the spirit of the Open Science concept ().

Methods

In this work, I introduce the tool by outlining previous research, addressing the limitations of existing FAIR assessment tools and demonstrating how I used the insights gained during community consultation within AgReFed to design software, which is a novel approach to FAIR assessments. I also describe the structure of the assessment tool and demonstrate its functionality.

Prior Research and Existing FAIR Assessment Tools

At the time of writing, dedicated FAIR assessment tools for the agriculture domain could not be found, except for the initial spreadsheet-based AgReFed assessment. However, several assessment tools exist in the broader data community. The Research Data Alliance’s FAIR Data Maturity Model Working Group analysed 12 different FAIR assessment tools with regard to alignment with the FAIR principles and their specific facets (); however, the tools were not evaluated with regard to features or usability. Krans et al. () reviewed 10 tools for characteristics such as level of user expertise required, guidance provided, ease of use, types of input and output, tool maturity and recommendations for improvement for making a dataset more FAIR. The FAIR assessment tools listed in both aforementioned reviews, as well as five tools compared by Gehlen et al. (), were reviewed as potential candidates for use within AgReFed. The following section lists the outcome of this review.

Limitations of Existing FAIR Assessment Tools

During the development of the minimum thresholds for AgReFed, exemplar datasets were initially found to vary in findability, that is, whether they had a permanent identifier, a metadata record and whether that metadata record was indexed in a searchable repository (). Fully automated assessment tools are unable to capture any FAIR metrics in scenarios where a dataset doesn’t at least have a Uniform Resource Locator (URL) and a machine-readable metadata record. Systematic analyses testing the FAIRness of agricultural datasets using the FAIR Evaluation Services () and the F-UJI API () illustrate both the potential and the limitations of automated assessment tools.

Generic FAIR assessment tools lack domain specificity. Every domain has its own standards, or de facto standards, conventions, accepted file types and licences, and therefore should have its own set of FAIR metrics (). The FAIR Evaluation Services addresses this issue through an open authoring framework where communities can create and publish their own metrics (), whereas other assessment tools provide a fixed set of generic or domain-specific FAIR metrics. A further consideration is that metrics can become outdated as data communities evolve, making a case for updatable tests.

FAIR assessment results should be permanently recorded for purposes such as compliance checking and auditing when required, as well as for future analysis. Some online tools will save all results in a public listing, which can be accessed at any time (for example, in the FAIR Evaluation Services – https://fairsharing.github.io/FAIR-Evaluator-FrontEnd/#!/collections/new/evaluate), whereas others allow the export of the assessment result (e.g., JSON output from the online F-UJI Tool – https://www.f-uji.net). However, most assessment tools only display the results as a percentage or graphic representation without allowing them to be saved.

None of the tools investigated allow the reassessment of digital resources that have already been tested or provide for an easy comparison of multiple assessments of a resource. A reassessment of digital resources in existing assessment tools is possible only by re-entering all information as a new assessment. Even where it is possible to export the assessment results, comparisons of an initial FAIR assessment with subsequent assessments are not straightforward and require manual processing. AgReFed places a strong emphasis on the journey towards improvement in FAIRness. Therefore, an easy comparison between the initial and subsequent assessment results is considered essential.

In most cases, assessment tools provide no support to users moving towards improving the FAIRness of digital resources beyond providing a score (). AgReFed intends to facilitate self-assessment. This requires that users are provided with contextual help and linked resources, particularly in the early stages. Help resources within the user interface are limited in most of the assessment tools. Therefore, prior knowledge of FAIR is required; otherwise, a user must search for information online while conducting the assessment.

Brief Overview of Functionality and Running Environment

The AgReFed FAIR Assessment Tool is a web-based application that runs in a browser and measures a digital resource (at this stage, a dataset, whether tabular, a map-based data product or data delivered by a service) against the FAIR thresholds set out in the AgReFed Technical and Information Policy Suite (). The tool is a full-stack PHP application (using the Laravel framework) with a PostgreSQL database server-side and JavaScript (using the Vue framework) client-side. The FAIR assessment questions, selectable answers, scores and ancillary information are loaded dynamically from the database as a JSON object. All user responses and submitted FAIR assessments are saved in the database. Users are required to sign up to use the tool so that usage metrics can be collected and assessments can be attributed to users, as described in the tool’s disclaimer. This data collection was approved by Federation University’s Human Research Ethics Committee (HREC approval no: A21–046). This approval principally relates to interviews that were conducted in the early stages of AgReFed but also covers the usage metrics for the assessment tool.

Structure

The AgReFed FAIR Assessment Tool consists of the assessment form, an assessment listing and an assessment results page. Users can access only their own assessments. Additional information on the home page and the help page is accessible to all users.

The assessment form (shown in Figure 1) allows the user to enter details about the dataset to be assessed, including the name, description and reason for the assessment. This section is followed by 14 questions relating to the FAIR principles that are split into the sections Findable, Accessible, Interoperable and Reusable. Every section heading and every individual question have clickable help links, which open pop-ups containing external supplementary information.

Figure 1

FAIR assessment form showing first two questions and visual user feedback.

All questions are multiple-choice. The selected answer will give immediate visual feedback in the form of a bar graph and pop-up that indicates whether AgReFed thresholds are met. Additional information can be entered for each question. Depending on the selected answer for each question, evidence is requested (for example, URLs) in support. A user can also select implementation status and enter other comments, for example, additional evidence or to-do notes.

Form contents are automatically saved after every change so that a user may reopen an unsubmitted assessment for editing at any time. This also prevents data loss if the form is accidentally closed. When complete, the assessment can be submitted, and upon submission, the results page will be displayed. When a user has chosen to reassess a dataset, the same assessment form will be opened, pre-filled with the answers from the most recent assessment.

The assessment listing displays a list of assessments that the logged-in user has created previously. The entries in the list indicate the status of each assessment, which may be editable (if an assessment is a draft that has not yet been submitted) or not editable (if the assessment has been submitted). Each assessment in the list has a clickable link to view the complete assessment.

The assessment results page displays results for one or more assessments of a single digital resource. It consists of three sections: the first describes the resource, the second shows scores, and the third (and main) section contains tabs for each assessment and reassessment (where applicable). Figure 2 shows an example of a result with multiple assessments.

Figure 2

FAIR assessment results page example (abbreviated).

In the scores section, bar graphs indicate assessment scores for each of F, A, I, R categories, as well as the overall FAIRness of the assessment (green) and the machine readability scores obtained from the embedded F-UJI API (blue). If a resource was reassessed, like in the example shown, these bar graphs are stacked, showing the assessment and reassessment scores below to clearly indicate any improvement in FAIRness.

The main part of the results page shows the AgReFed compliance level for each of the 14 metrics, together with additional information that the user entered for each question. Where the user has added notes such as actions to take, this section of the results page will serve as a to-do list for improvements to undertake before attempting a re-assessment.

Before an assessment is submitted, it can be resumed and edited, as indicated by the presence of an edit button. If an assessment has already been submitted, it cannot be edited again, but the user can choose to re-assess the digital resource. For each assessment, results can be printed in Portable Document Format (PDF) or in Comma Separated Values (CSV) format.

The home page contains a brief statement about the purpose of the tool and links to the help page, the AgReFed website and the Plain Language Information Statement (PLIS) that informs and relates to the ethics approval. For signed-in users, the home page also displays links to start a new assessment and to list existing ones.

The help page provides information about the tool and the related AgReFed technical policy. It also contains help and advice on the features and how to use the assessment tool.

Discussion

This FAIR assessment tool has been developed specifically to address the needs of a data federation in the agricultural sector and is the result of a community approach to greater FAIRness of data in this domain.

The Agricultural Research Federation (AgReFed) Technical and Information Policy Suite () describes a set of 14 FAIR questions with corresponding minimum, ideal, and stretch goal requirements for agricultural datasets. These formed the basis of a spreadsheet containing the questions and associated possible answers, FAIR thresholds and required evidence that make up the AgReFed FAIR assessment. Based on this initial spreadsheet version, the AgReFed FAIR Assessment Tool was developed as a web-based application that presents these questions in a form with multiple-choice answers. This tool also implements the learnings and recommendations of Wong et al. () in relation to FAIR assessments that followed the completion of the initial stages of AgReFed.

During the development and early testing of the alpha version of the tool, additional user interface features were identified, subsequently implemented and further refined during beta testing. These additional features aimed to improve the user experience as well as the usefulness of the tool in a data custodian’s journey to improve the FAIRness of a digital resource.

Unlike other FAIR assessment tools, its primary purpose is not to provide a FAIRness percentage or other arbitrary score but rather to determine whether minimum or ideal thresholds (as set out in the AgReFed Tech Policy document) are met by a digital resource. The tool provides clear and unambiguous feedback to the user both visually, through colour coding, and in written form.

A scoring system was added to show improvements in FAIRness (if any) when a digital resource is reassessed. For example, the findability of a dataset is improved by moving it from local storage to a place on the Internet and making it accessible via a URL. As the AgReFed minimum threshold requires a URL with a unique and persistent identifier (e.g., a Document Object Identifier (DOI)), another URL will not suffice to achieve compliance. However, any URL is an improvement over none, and therefore the data custodian going through the process of making the dataset FAIRer obtains improved scores to indicate progress, even if the AgReFed minimum requirements are not yet met. The scores for each assessment are shown as bar graphs, giving an indication of the increasing levels of F, A, I, R, and overall FAIR score from the first to subsequent assessments. This feature was added to encourage data custodians in their efforts to make resources more FAIR.

To further encourage and support the progression of digital resources from less to more FAIR, the tool includes entry fields to indicate the implementation status and freeform comments or notes. This seemingly simple additional functionality elevates the tool from a mere assessment tool to a more integrated solution that supports improvement towards AgReFed minimum standards (or higher). A data custodian can use the notes field so that the assessment output can serve as a worklist for future improvements and, if required, the list can be printed. Alternatively, the notes field can be used to provide additional evidence or other relevant comments.

The AgReFed FAIR Assessment Tool uses a hybrid approach; it is manual in the sense that the data custodian or other person running the assessment must enter data into a form and select answers for multiple-choice questions. However, it is integrated with the F-UJI tool’s API, which performs an automated assessment and provides feedback on the machine readability of the digital resource to be assessed (). As previously stated, an automated tool will not be able to provide any scores for datasets that are not online or not listed in a catalogue; therefore, it will not be able to support or give good feedback to data custodians whose resources are initially at a very low level on the FAIR scale. AgReFed’s goals include providing guidance and support, so a manual approach incorporating plentiful help and feedback was chosen. Once a digital resource has been improved to where it is findable and has machine-readable metadata, integration with the F-UJI tool will yield additional insights regarding the machine-readability of metadata related to interoperability and reusability. This approach combining the advantages of manual and automated approaches is supported by the findings from assessment tool evaluations (; ) and has been adopted in other domains ().

Access to saved assessment results is available in the list of assessments previously run within the tool’s interface. They can also be printed and exported as CSV, and the raw F-UJI score can be viewed as JSON. This allows data custodians to inspect, save and query records of FAIR assessments for purposes such as reporting, quality control or other needs.

When an assessment is run in the web interface, every change is saved so that the user can return at any time to continue adding and editing until the assessment is completed and submitted. An assessment can be edited as many times as necessary without a time limit until it is completed and the user submits the form.

Once submitted, any digital resource may be reassessed. A reassessment will be pre-populated with the most recent assessment results to speed up data entry. Where a digital resource has been assessed more than once, all sets of assessment results are displayed on the results page. The AgReFed compliance levels are shown in a separate tab for each assessment on that resource, and scores are displayed as stacked bar graphs for a visual representation of the level of FAIRness and the improvement(s) between assessment iterations.

The FAIR assessment questions, metrics and other supporting information are dynamically generated from JSON fields in the database, so that it is possible to:

upgrade the metrics in subsequent versions,
substitute the metrics for another set that may suit another organisation or domain, and
use different metrics for different digital resources (currently supports datasets)

This means the tool can be adopted for other purposes without substantial changes to the source code.

Software Accessibility

The source code is available at https://doi.org/10.25955/22105712 and on GitHub at https://github.com/AgReFed/agrefed-fair-assessment.

A working version of the AgReFed FAIR assessment tool is available for public use at https://assessment.agrefed.org.au.

Conclusions and Future Work

Prior to tool development, community consultation and the efforts of the technical committee resulted in the AgReFed minimum standards and a set of 14 questions (). This recognises the importance of the community-driven aspect of FAIR (; ; ).

No comparable alternative domain-specific set of FAIR questions exists for comparison with the AgReFed standards. To ensure they are fit for the purpose of the agricultural sector in Australia, these questions were used to test seven datasets through a manual process enacted by a data steward during the first stage of AgReFed. These datasets included plant, soil and climate variables and consisted of a variety of data formats, including point observations, sensor, spatial and temporal data from different providers (). These datasets were assessed at the outset and after efforts to improve FAIRness with the help of the data steward. Applying the 14 FAIR questions to actual use cases showed they provided a workable solution. The questions, possible answers, supporting documentation and minimum standards in the FAIR assessment tool are based on this work.

Through an investigation of the literature and existing FAIR tools, shortcomings of existing tools were identified, and novel solutions were implemented while addressing the specific needs of the AgReFed community. The approach taken for AgReFed FAIR Assessment Tool is supported by the recommendation by Gehlen et al. (), which is a ‘mature hybrid approach’, where hybrid relates to a mix of automatic and manual assessment methods and maturity relates to community involvement. Krans et al. () recommended that tool developers provide built-in guidance for users and provide a fully developed tool published as licenced open source. These recommendations were implemented here. However, the AgReFed FAIR assessment tool goes one step further by using an architecture that facilitates the use of other (or multiple) sets of FAIR metrics.

Learnings from the analysis of other FAIR tools guided the development of the AgReFed FAIR assessment tool so that it will better suit the needs of the agricultural community. It is hoped that it can also provide ideas and incentives for the development and provision of better tools in other domains.

The AgReFed FAIR assessment tool has undergone preliminary user acceptance testing. Testing was conducted in-house, and a link to a test version of the tool was circulated to AgReFed partners. The tool was also demonstrated to ARDC and other partners. Feedback received during the early stages (mainly user interface design improvements) was incorporated into the tool, and later feedback has been added to future work. At the time of writing, 27 users had signed up for the assessment tool.

It was released as open source code and as an online tool as part of the AgReFed website in December 2022. This release addresses one of the milestones of the AgReFed platform project, which is an ARDC-supported platform transforming agricultural research. The assessment tool webpages and database of tested resources (and FAIR scores gained) will be monitored to collect statistics as part of informing the evolution and monitoring the impact of AgReFed on enabling FAIR agricultural data over time. The public Github repository will also be monitored. These statistics will provide feedback regarding the acceptance and usefulness of the assessment tool and guide future development.

The current suite of FAIR assessment tools (including the AgReFed FAIR Assessment Tool) relate to datasets (or collections of datasets) more than to other digital resources within the FAIR ecosystem. There is a significant overlap of metadata requirements between these, but it has been recognised that research software, models, algorithms, services, ontologies and vocabularies should be assessed differently for datasets (; ; ; ; ). Therefore, ideally, an assessment tool should use specific questions, supporting assessment help, evidence and possible answers for each of those resource types. As the AgReFed FAIR Assessment Tool has been designed to dynamically load a set of questions (and related information), the functionality to provide different assessment types is already accommodated within the tool, so it would require only minor additional changes for full implementation. More work is required to develop sets of questions for different digital resources, which could promote recognition of research software and vocabularies as first-class research outputs. Such recognition is identified as a vision both nationally by the Australian Research Data Commons () and internationally by the Research Data Alliance ().

It is expected that FAIR implementations will continue to evolve (). This may mean the FAIR assessment metrics need to be updated. The updating process can be accommodated in the tool through the same dynamic metric loading system mentioned above.

For example, it has been recommended to add data quality to FAIR metrics (), as FAIR does not address the quality of data. In line with most FAIR implementations, the AgReFed FAIR metrics currently make no mention of data quality beyond requirements for open and interoperable file formats. Accordingly, data quality tests were not considered in the design of this assessment tool but could be added by updating the dynamically loaded metrics, if required by the AgReFed Technical Committee.

Another possible further development of the assessment tool is an improvement in the integration with the F-UJI API. Because F-UJI is open source, its repository could be forked and the code base edited to bring the assessment metrics more in line with AgReFed minimum standards. That way, the F-UJI score would more closely resemble the AgReFed threshold in the future for cases where the resource is machine-readable. Additionally, the results from the automated assessment could be used to pre-populate some of the answer fields for the AgReFed FAIR questions.

Other possible features for future versions include a bulk export of assessment scores to facilitate reporting for entities that host many digital resources, user summaries showing the number of assessments, an average FAIR score and the ability to identify low-scoring assessments for reporting and follow-up activities. The addition of an option to delete a user account has been recommended and is planned for a future software release.

Documented benefits of data sharing and reuse in the broad research community () led to the FAIR principles, which provide guidance on data management and stewardship (). Agricultural research and industry, which produce a broad and diverse range of datasets, can benefit from applying FAIR principles to data management (), particularly in view of their overlap with the environmental sciences, animal health, soil sciences, weather and climate science and other domains. AgReFed’s FAIR assessment tool is intended to help the agricultural research community create and maintain FAIR data and metadata by providing valuable feedback on whether agricultural datasets meet community-developed criteria for findability, accessibility, interoperability and reusability. The application of objective FAIR measurements of datasets can also help inform the agricultural research community on whether the collective goal of increasing FAIRness is being achieved.

Data Science Journal

Practice Papers

Advancing FAIR Agricultural Data: The AgReFed FAIR Assessment Tool

Abstract

Introduction

Background

Motivation for Software

Methods

Prior Research and Existing FAIR Assessment Tools

Limitations of Existing FAIR Assessment Tools

Brief Overview of Functionality and Running Environment

Structure

Discussion

Software Accessibility

Conclusions and Future Work

Notes

Acknowledgements

Funding Information

Competing Interests

References