Making Drone Data FAIR Through a Community-Developed Information Framework

Small Uncrewed Aircraft Systems (sUAS) are an increasingly common tool for data collection in many scientific fields. However, there are few standards or best practices guiding the collection, sharing, or publication of data collected with these tools. This makes collaboration, data quality control, and reproducibility challenging. To that end, we have used iterative rounds of data modeling and user engagement to develop a Minimum Information Framework (MIF) to guide sUAS users in collecting the metadata necessary to ensure that their data is trust-worthy, shareable and reusable. This paper briefly outlines our methods and the MIF itself, which includes 74 metadata terms in four classes that sUAS users should consider collecting for any given study. The MIF provides a foundation which can be used for developing standards and best practices.


INTRODUCTION
Small Uncrewed Aircraft Systems (sUAS) -commonly known as drones -are an increasingly important tool for data collection in many scientific fields. However, best practices for sUAS data capture and management are still being developed, and require further refinement and adoption (Wyngaard et al. 2019). Researchers in fields such as wildlife monitoring (Barnas et al. 2020), vegetation monitoring (Assmann et al. 2018), atmospheric sciences (Barbieri et al. 2019), and in the assessment of built environments and energy infrastructure (Rakha & Gorodetsky 2018) have all called for the development of sUAS data and metadata best practices. Thus, there is a common recognition of both the immediate and long term value of rigorous data stewardship across many of these fields.
Despite broad consensus that data and metadata best practices are needed, there is still much work to be done developing new standards or practices that address the complex data pipeline and products typical of a sUAS project (see Wyngaard et al. 2019 for a detailed discussion of this and Figure 1 for a high level view of a typical sUAS research workflow). Furthermore, while practices, standards, ontologies, and tools of relevance and value exist due to prior work and parallel advances, none are either sufficient or directly reusable in addressing the practical needs of all aspects of sUAS data management, nor has any collection of these become a common or standardised approach to addressing all aspects of sUAS workflows and data products. For instance, the Drontology ontology focuses on describing drone hardware specifications, but not on drone processes or data output (Lammerding n.d.). There are well established ontologies currently available for describing observational data (Open Geospatial Consortium n.d.), sensor platforms and procedures (Janowicz et al. 2018), and provenance (Lebo et al. 2013); and numerous scientific domains have developed ontologies to describe common parameters as understood by that discipline (for instance Climate and Forecast [CF] Metadata Conventions (Mattben n.d.) or the environment ontology Buttigieg et al. 2013). But there is a lack of formally modeled ontologies for describing particularly sUAS platforms and flight plans and patterns, and no work has been published showing how these existing components might be used together to describe sUAS data. Similar parallel and partial solutions exist when considering the data workflow stages requiring standard data formats, data product levels, qualified algorithms, and recognised processes.
A community-developed framework is needed to help guide sUAS data producers and managers in bringing together these different ontologies, and in creating effective sUAS metadata best practices. This framework should articulate the classes of metadata needed at a high level, and by different user communities.
In this paper, we describe efforts to develop such a framework through extensive sUAS user engagement. We particularly focus on our community engagement and iterative design processes. We also briefly describe the resulting a Minimum Information Framework (MIF) for data captured with sUAS. A MIF is a high-level information model outlining key metadata elements (organized into classes) needed to support data sharing, management, and publication (Thomer et al. 2018;Palmer et al. 2017), all in a Findable, Accessible, Interoperable, and Reusable (FAIR) manner (Wilkinson et al. 2016). The MIF also articulates the relationships between those attributes (and their classes). This framework is intended to be iteratively revised (even after this initial publication), used in ontology and best practice development, and should inform the selection of formal metadata best practices. The terms in the MIF can be mapped to existing standards and ontologies in creating an application profile. The MIF can also be used as a checklist for different organizations and communities to explore the kinds of metadata that might be important in facilitating data reuse. 3 Barbieri et al. Data Science Journal DOI: 10.5334/dsj-2023-001 This framework is not intended as a standard in and of itself, but rather, is a first step towards the development of domain-or institution-specific standards and best practices. We do not provide guidance about specific tooling or other hardware set-ups that might make data more or less FAIR; instead, we outline the metadata elements that are potentially important for the provisioning of FAIR data. We describe the implications of our design further in the discussion section.

METHODS
The MIF was developed through iterative rounds of community engagement and feedback, as well as systematic analysis of sUAS user data practices. Specifically, we held a series of workshops and engagement events to build community, better understand user needs, and eventually gain feedback on our proposed framework (Wyngaard et al. 2019). We also used a research process modeling approach (Thomer et al. 2018) to develop in-depth case studies of scientific research with sUAS. We blended these approaches because data and metadata standards must be grounded in community consensus, systematic analysis of the data itself, and in the reality of users' day-to-day practices (Millerand & Bowker 2009). Our interview protocols were reviewed by the lead PI's Institutional Review Board (IRB). Because our work focused on the development of a standard, the IRB found that our work did not count as human subjects research and did not require oversight. Nevertheless, we still provided our participants with descriptions of our research aims and intentions before discussions and workshops, and obtained signed consent for the use of interview data before interviews.

PHASE I: COMMUNITY BUILDING AND INITIAL DEVELOPMENT OF MIF THROUGH RESEARCH PROCESS MODELING
We held over 20 workshops, conference sessions, and other community engagement events through organizations the Earth Science Information partners (ESIP), Research Data Alliance (RDA), and American Geophysical Union (AGU) (Wyngaard et al. 2019). These efforts resulted in a broad understanding of sUAS metadata needs across earth science fields. During the 2017 ESIP sUAS Data Management Workshop, we identified three distinct cases to serve as exemplars for further metadata development. These included: 1. sUAS-based biodiversity monitoring in Colorado, contributed by researchers at USGS; 2. sUAS biomass and agricultural runoff monitoring, contributed by author Wyngaard; 3. sUAS atmospheric greenhouse gas monitoring at an agricultural site, contributed by author Barbieri.
We interviewed key stakeholders (all drone users) for each case (n = 5 total for three cases), and then used these interviews to diagram their workflows, data products, and key parameters and metadata to capture at each stage following the research process modeling method. We developed the MIF based on these results.

PHASE II: MIF REFINEMENT THROUGH FOCUSED USER FEEDBACK
The MIF was further refined through a survey of experts from the earth sciences (n=11). These participants were different from the participants in Phase I and were comprised of both drone users and drone data users. We asked survey participants to rank each term on a four-point scale: 1 -'Can't use the data without it'; 2 -'Won't use the data without it'; 3 -'Can take it or leave it'; or 4 -'Don't need it, don't bother.' We simultaneously conducted hour-long, semi-structured interviews with four additional earth scientists (who did not participate in the survey or in Phase I) who both use drones in their field work and who use drone data in their research. We walked through the same survey of terms and asked for responses on the same four-point scale, and received richer responses that helped us better understand how users interpreted the proposed terms in their different domains.
We reviewed and revised the MIF to incorporate this feedback. We found that our survey respondents and interview subjects sometimes offered contradictory opinions on the necessity of a particular term, which typically reflected the needs of their respective domains and the different terms deemed necessary for drone flight operations and management and the 4 Barbieri et al. Data Science Journal DOI: 10.5334/dsj-2023-001 terms deemed necessary for data reuse. We consequently left many terms in that wouldn't necessarily be needed by all groups, with the idea that each group could create different application profiles from the MIF.

PHASE III: PILOT INSTANTIATION OF THE MIF
Through a six month collaboration with a Data Best Practices working group from the U.S. Long Term Ecological Research Network U.S (LTER), we demonstrated how the MIF might serve their data managers emerging needs, and simultaneously refined our terms based on their feedback. None of the members of this working group were participants in earlier stages of the MIF development (Phase I) and refinement (Phase 2). We worked with their team and users to rank each metadata term according to its usefulness in the contexts of: Discovery (enables search in data archives); Fitness for use (enables an end user to assess whether a dataset will suit their research needs); Necessary for reuse (details that would be needed to reuse, reprocess or otherwise interpret the data). For all three contexts, each term was assigned a value on a scale from 1-5 (where 1 = not useful, 5 = essential). The LTER information managers and their users provided us with expert input on these value assignments. Based on this input we have now included these rankings in our published MIF, while also noting that these rankings may differ by user communities.

RESULTS
We identified 74 terms, divided into the following four classes of information that must be collected to make sUAS data FAIR:

Project metadata: this class captures information about the project itself, including
investigator names and affiliations; project plans, goals, and hypotheses; features of interests; and any access or use restrictions.

Individual flight metadata: this class captures information about a given flight, its plans, and its actual flight path. The elements in this class are divided into three subclasses:
Flight checks & calibrations, which capture information about safety and quality checks and corrections; mission plans, which capture programmed flight paths and sampling plans; and platform & payload, which capture technical specifications about the drone itself and its hardware.

Dataset from flight:
this class contains metadata about the dataset collected on a given flight. This class is split into two subclasses: the flight log subclass, which includes metadata about the actual flight itself (not the planned flight, which is captured in the Individual flights: mission plans subclass); and the observational dataset subclass, which describes the observational data collected by the flight.
4. Individual data points: this class includes metadata to contextualize individual data points within a dataset, including unique identifiers for each observation, and geographic coordinates. Figure 2 illustrates these classes and their relationships at a high level. The class structure of the MIF takes inspiration from prior work by Thomer (2022), in which data, collecting events, and collecting sites are modeled separately, as well as standards such as the Data Documentation Initiative (DDI), which models metadata based on different steps of the data lifecycle (Ryssevik n.d.). Figure 3 includes all terms at the time of publication; the full MIF including term definitions is available via Zenodo (Thomer et al. 2020). We have made recommendations for terms that should be required for discovery; terms that should be required for assessing fitness for use; and terms that are important for reuse. We recommend that terms needed in all three situations be required as minimum metadata for data collected by sUAS. We additionally note that some of the terms in the MIF overlap with mandatory terms in the DataCite Schema (a commonly used metadata standard for sharing research data) (Group 2021); we additionally recommend that these overlapping terms be required as minimum metadata. Again, we note that these terms may differ by community, and that user studies should be done when implementing the MIF for a given data system, community, or project. 5 Barbieri et al. Data Science Journal DOI: 10.5334/dsj-2023-001

PILOT INSTANTIATIONS OF THE FRAMEWORK
The MIF can be used by data collectors or archivers to begin development of best practices or other guidelines for collecting and curating data. We expect that every group will not need to capture every data element. Rather, the MIF outlines important data elements that should be considered in any sUAS project.
Research teams may wish to rank terms according to their importance for a given study, context, or organization. We demonstrate the use of the MIF to develop localized best practices with a group from the LTER.
The U.S. Long Term Ecological Research (LTER) network consists of 28 sites each of which serves to both capture baseline ecological data over the long term and facilitate active research. We worked with the team of information managers who manage the data captured at these sites, and whom are increasingly being requested to archive and advise on the sUAS data now also being captured. Managers ranked terms in the MIF according to their importance for given use cases within the LTER. The MIF was then used as the basis for development of LTER metadata guidelines for data gathered with sUAS Gries et al. (2021). These best practices include recommendations for sUAS data repositories, design of sUAS data packages, and examples of semantic annotation. This successful pilot validated the MIF as a useful framework for best practices development.  6 Barbieri et al. Data Science Journal DOI: 10.5334/dsj-2023-001 Additionally, the MIF is being used by the Linked Data and Networked Drones (LANDRS) project (led by PIs Wyngaard and Barbieri) to build automated data annotation software tools for use onboard sUAS using linked data principles and tool stacks as its core (Wyngaard 2021). LANDRS shares the assumption underlying the MIF -that this framework will evolve and be implemented differently in different domains -and is therefore building these tools to automatically update as an underlying sUAS data framework is updated. Doing so requires an initial ontology be created. A significant proportion of LANDRS work has therefore been to align existing mature ontologies. The MIF has served as one of the core initial references for this work of building an aligned base sUAS ontology from already established ontologies.

DISCUSSION
The MIF can help structure and prioritize metadata collection associated with sUAS data capture. It is intended to be further refined to better suit specific research and data management needs, as demonstrated in the pilot instantiation of the MIF with LTER (Gries et al. 2021). The MIF is not intended to be a standard, but rather, a reference guide and framework for the development of domain specific standards and best practices. Future efforts to establish sUAS data standards would benefit from considering both more varied disciplinary uses of sUAS for scientific data collection, and additionally other uncrewed platforms that may operate in different environments (e.g. underwater gliders (Bachmayer et al. 2004), sail drones (Gentemann et al. 2020Mordy et al. 2017)). Further, the development of this MIF may also benefit data collection from these other uncrewed research observation platforms. While the MIF is based on more than six years of engagement with the scientific sUAS user community, we note that our development is limited by our working primarily with North American researchers, and during a period in which significant changes have been underway regarding sUAS regulations, sUAS adoption, and sUAS user expertise. Nevertheless, we propose that future users of the MIF will find it serves them well, particularly if they consider some of the following when developing their own sUAS data standards.

USING THE MIF TO EVALUATE DATA TRUSTWORTHINESS AND FITNESS FOR USE
The MIF can be used to develop a rubric for showing what metadata is necessary to render a dataset trustworthy or fit-for-use given a particular set of metadata and a particular use case, as demonstrated in the pilot instantiation of the MIF with the LTER. This rubric could be further used to then evaluate datasets for the presence or absence of this necessary metadata, and perhaps to develop a rough 'reusability score' for a collection of datasets. This would be similar to prior work using the completeness of metadata as a proxy for metadata quality (Liolios et al. 2012;Margaritopoulos et al. 2012), but with the added advantage of rooting this evaluation in community norms and consensus.

THE USE OF EXISTING ONTOLOGIES AND METADATA STANDARDS IN DISSEMINATING sUAS DATA
Different communities may wish to rely on different ontologies or metadata standards for reasons that suit their individual contexts, and we don't want to limit the applicability of the MIF by constraining it to particular standard or serialization at this moment. As noted in the Introduction, though many of the terms in the MIF are present in established ontologies, there are known gaps in the available ontologies (Taleisnik 2020;Jones et al. 2021;Wyngaard et al. 2019). The MIF can be used to create an application profile of different standards and ontologies; the resulting data can be serialized as linked data or any other format that makes sense for a given community. Thus, the MIF is a useful tool to aid in bringing ontologies together for sUAS data products, and to guide further ontology development.

WORKING WITH SOFTWARE-DERIVED METADATA
One underlying concern in this project is the accessibility of software-derived or generated metadata. In some sUAS platforms, not all important metadata are recorded and of those terms that are, the metadata can be hard to access or export, which limits the usability of these platforms for scientific research (Wyngaard et al. 2019). We encourage sUAS hardware and software developers to consider the MIF in their work, and ensure that the data we've identified 8 Barbieri et al. Data Science Journal DOI: 10.5334/dsj-2023-001 DG, Schuyler, TJ, Shankar, A, Smith, SW, Waugh, S, Dixon, C, Borenstein, S and de Boer, G. 2019.