Small Uncrewed Aircraft Systems (sUAS) — commonly known as drones — are an increasingly important tool for data collection in many scientific fields. However, best practices for sUAS data capture and management are still being developed, and require further refinement and adoption (Wyngaard et al. 2019). Researchers in fields such as wildlife monitoring (Barnas et al. 2020), vegetation monitoring (Assmann et al. 2018), atmospheric sciences (Barbieri et al. 2019), and in the assessment of built environments and energy infrastructure (Rakha & Gorodetsky 2018) have all called for the development of sUAS data and metadata best practices. Thus, there is a common recognition of both the immediate and long term value of rigorous data stewardship across many of these fields.
Despite broad consensus that data and metadata best practices are needed, there is still much work to be done developing new standards or practices that address the complex data pipeline and products typical of a sUAS project (see Wyngaard et al. 2019 for a detailed discussion of this and Figure 1 for a high level view of a typical sUAS research workflow). Furthermore, while practices, standards, ontologies, and tools of relevance and value exist due to prior work and parallel advances, none are either sufficient or directly reusable in addressing the practical needs of all aspects of sUAS data management, nor has any collection of these become a common or standardised approach to addressing all aspects of sUAS workflows and data products. For instance, the Drontology ontology focuses on describing drone hardware specifications, but not on drone processes or data output (Lammerding n.d.). There are well established ontologies currently available for describing observational data (Open Geospatial Consortium n.d.), sensor platforms and procedures (Janowicz et al. 2018), and provenance (Lebo et al. 2013); and numerous scientific domains have developed ontologies to describe common parameters as understood by that discipline (for instance Climate and Forecast [CF] Metadata Conventions (Mattben n.d.) or the environment ontology Buttigieg et al. 2013). But there is a lack of formally modeled ontologies for describing particularly sUAS platforms and flight plans and patterns, and no work has been published showing how these existing components might be used together to describe sUAS data. Similar parallel and partial solutions exist when considering the data workflow stages requiring standard data formats, data product levels, qualified algorithms, and recognised processes.
A community-developed framework is needed to help guide sUAS data producers and managers in bringing together these different ontologies, and in creating effective sUAS metadata best practices. This framework should articulate the classes of metadata needed at a high level, and by different user communities.
In this paper, we describe efforts to develop such a framework through extensive sUAS user engagement. We particularly focus on our community engagement and iterative design processes. We also briefly describe the resulting a Minimum Information Framework (MIF) for data captured with sUAS. A MIF is a high-level information model outlining key metadata elements (organized into classes) needed to support data sharing, management, and publication (Thomer et al. 2018; Palmer et al. 2017), all in a Findable, Accessible, Interoperable, and Reusable (FAIR) manner (Wilkinson et al. 2016). The MIF also articulates the relationships between those attributes (and their classes). This framework is intended to be iteratively revised (even after this initial publication), used in ontology and best practice development, and should inform the selection of formal metadata best practices. The terms in the MIF can be mapped to existing standards and ontologies in creating an application profile. The MIF can also be used as a checklist for different organizations and communities to explore the kinds of metadata that might be important in facilitating data reuse.
This framework is not intended as a standard in and of itself, but rather, is a first step towards the development of domain- or institution- specific standards and best practices. We do not provide guidance about specific tooling or other hardware set-ups that might make data more or less FAIR; instead, we outline the metadata elements that are potentially important for the provisioning of FAIR data. We describe the implications of our design further in the discussion section.
The MIF was developed through iterative rounds of community engagement and feedback, as well as systematic analysis of sUAS user data practices. Specifically, we held a series of workshops and engagement events to build community, better understand user needs, and eventually gain feedback on our proposed framework (Wyngaard et al. 2019). We also used a research process modeling approach (Thomer et al. 2018) to develop in-depth case studies of scientific research with sUAS. We blended these approaches because data and metadata standards must be grounded in community consensus, systematic analysis of the data itself, and in the reality of users’ day-to-day practices (Millerand & Bowker 2009). Our interview protocols were reviewed by the lead PI’s Institutional Review Board (IRB). Because our work focused on the development of a standard, the IRB found that our work did not count as human subjects research and did not require oversight. Nevertheless, we still provided our participants with descriptions of our research aims and intentions before discussions and workshops, and obtained signed consent for the use of interview data before interviews.
We held over 20 workshops, conference sessions, and other community engagement events through organizations the Earth Science Information partners (ESIP), Research Data Alliance (RDA), and American Geophysical Union (AGU) (Wyngaard et al. 2019). These efforts resulted in a broad understanding of sUAS metadata needs across earth science fields. During the 2017 ESIP sUAS Data Management Workshop, we identified three distinct cases to serve as exemplars for further metadata development. These included:
We interviewed key stakeholders (all drone users) for each case (n = 5 total for three cases), and then used these interviews to diagram their workflows, data products, and key parameters and metadata to capture at each stage following the research process modeling method. We developed the MIF based on these results.
The MIF was further refined through a survey of experts from the earth sciences (n=11). These participants were different from the participants in Phase I and were comprised of both drone users and drone data users. We asked survey participants to rank each term on a four-point scale: 1 - ‘Can’t use the data without it’; 2 - ‘Won’t use the data without it’; 3 - ‘Can take it or leave it’; or 4 - ‘Don’t need it, don’t bother.’
We simultaneously conducted hour-long, semi-structured interviews with four additional earth scientists (who did not participate in the survey or in Phase I) who both use drones in their field work and who use drone data in their research. We walked through the same survey of terms and asked for responses on the same four-point scale, and received richer responses that helped us better understand how users interpreted the proposed terms in their different domains.
We reviewed and revised the MIF to incorporate this feedback. We found that our survey respondents and interview subjects sometimes offered contradictory opinions on the necessity of a particular term, which typically reflected the needs of their respective domains and the different terms deemed necessary for drone flight operations and management and the terms deemed necessary for data reuse. We consequently left many terms in that wouldn’t necessarily be needed by all groups, with the idea that each group could create different application profiles from the MIF.
Through a six month collaboration with a Data Best Practices working group from the U.S. Long Term Ecological Research Network U.S (LTER), we demonstrated how the MIF might serve their data managers emerging needs, and simultaneously refined our terms based on their feedback. None of the members of this working group were participants in earlier stages of the MIF development (Phase I) and refinement (Phase 2). We worked with their team and users to rank each metadata term according to its usefulness in the contexts of: Discovery (enables search in data archives); Fitness for use (enables an end user to assess whether a dataset will suit their research needs); Necessary for reuse (details that would be needed to reuse, reprocess or otherwise interpret the data). For all three contexts, each term was assigned a value on a scale from 1-5 (where 1 = not useful, 5 = essential). The LTER information managers and their users provided us with expert input on these value assignments. Based on this input we have now included these rankings in our published MIF, while also noting that these rankings may differ by user communities.
We identified 74 terms, divided into the following four classes of information that must be collected to make sUAS data FAIR:
Figure 2 illustrates these classes and their relationships at a high level. The class structure of the MIF takes inspiration from prior work by Thomer (2022), in which data, collecting events, and collecting sites are modeled separately, as well as standards such as the Data Documentation Initiative (DDI), which models metadata based on different steps of the data lifecycle (Ryssevik n.d.). Figure 3 includes all terms at the time of publication; the full MIF including term definitions is available via Zenodo (Thomer et al. 2020). We have made recommendations for terms that should be required for discovery; terms that should be required for assessing fitness for use; and terms that are important for reuse. We recommend that terms needed in all three situations be required as minimum metadata for data collected by sUAS. We additionally note that some of the terms in the MIF overlap with mandatory terms in the DataCite Schema (a commonly used metadata standard for sharing research data) (Group 2021); we additionally recommend that these overlapping terms be required as minimum metadata. Again, we note that these terms may differ by community, and that user studies should be done when implementing the MIF for a given data system, community, or project.
The MIF can be used by data collectors or archivers to begin development of best practices or other guidelines for collecting and curating data. We expect that every group will not need to capture every data element. Rather, the MIF outlines important data elements that should be considered in any sUAS project.
Research teams may wish to rank terms according to their importance for a given study, context, or organization. We demonstrate the use of the MIF to develop localized best practices with a group from the LTER.
The U.S. Long Term Ecological Research (LTER) network consists of 28 sites each of which serves to both capture baseline ecological data over the long term and facilitate active research. We worked with the team of information managers who manage the data captured at these sites, and whom are increasingly being requested to archive and advise on the sUAS data now also being captured. Managers ranked terms in the MIF according to their importance for given use cases within the LTER. The MIF was then used as the basis for development of LTER metadata guidelines for data gathered with sUAS Gries et al. (2021). These best practices include recommendations for sUAS data repositories, design of sUAS data packages, and examples of semantic annotation. This successful pilot validated the MIF as a useful framework for best practices development.
Additionally, the MIF is being used by the Linked Data and Networked Drones (LANDRS) project (led by PIs Wyngaard and Barbieri) to build automated data annotation software tools for use onboard sUAS using linked data principles and tool stacks as its core (Wyngaard 2021). LANDRS shares the assumption underlying the MIF – that this framework will evolve and be implemented differently in different domains – and is therefore building these tools to automatically update as an underlying sUAS data framework is updated. Doing so requires an initial ontology be created. A significant proportion of LANDRS work has therefore been to align existing mature ontologies. The MIF has served as one of the core initial references for this work of building an aligned base sUAS ontology from already established ontologies.
The MIF can help structure and prioritize metadata collection associated with sUAS data capture. It is intended to be further refined to better suit specific research and data management needs, as demonstrated in the pilot instantiation of the MIF with LTER (Gries et al. 2021). The MIF is not intended to be a standard, but rather, a reference guide and framework for the development of domain specific standards and best practices. Future efforts to establish sUAS data standards would benefit from considering both more varied disciplinary uses of sUAS for scientific data collection, and additionally other uncrewed platforms that may operate in different environments (e.g. underwater gliders (Bachmayer et al. 2004), sail drones (Gentemann et al. 2020; Mordy et al. 2017)). Further, the development of this MIF may also benefit data collection from these other uncrewed research observation platforms. While the MIF is based on more than six years of engagement with the scientific sUAS user community, we note that our development is limited by our working primarily with North American researchers, and during a period in which significant changes have been underway regarding sUAS regulations, sUAS adoption, and sUAS user expertise. Nevertheless, we propose that future users of the MIF will find it serves them well, particularly if they consider some of the following when developing their own sUAS data standards.
The MIF can be used to develop a rubric for showing what metadata is necessary to render a dataset trustworthy or fit-for-use given a particular set of metadata and a particular use case, as demonstrated in the pilot instantiation of the MIF with the LTER. This rubric could be further used to then evaluate datasets for the presence or absence of this necessary metadata, and perhaps to develop a rough ‘reusability score’ for a collection of datasets. This would be similar to prior work using the completeness of metadata as a proxy for metadata quality (Liolios et al. 2012; Margaritopoulos et al. 2012), but with the added advantage of rooting this evaluation in community norms and consensus.
Different communities may wish to rely on different ontologies or metadata standards for reasons that suit their individual contexts, and we don’t want to limit the applicability of the MIF by constraining it to particular standard or serialization at this moment. As noted in the Introduction, though many of the terms in the MIF are present in established ontologies, there are known gaps in the available ontologies (Taleisnik 2020; Jones et al. 2021; Wyngaard et al. 2019). The MIF can be used to create an application profile of different standards and ontologies; the resulting data can be serialized as linked data or any other format that makes sense for a given community. Thus, the MIF is a useful tool to aid in bringing ontologies together for sUAS data products, and to guide further ontology development.
One underlying concern in this project is the accessibility of software-derived or generated metadata. In some sUAS platforms, not all important metadata are recorded and of those terms that are, the metadata can be hard to access or export, which limits the usability of these platforms for scientific research (Wyngaard et al. 2019). We encourage sUAS hardware and software developers to consider the MIF in their work, and ensure that the data we’ve identified as being likely necessary for scientific use, reuse, discovery, and reproducibility is easily accessible in their stacks. Additionally, we encourage these producers to consider whether raw or derived data are stored and exportable by end users, as these are often needed in scientific contexts.
The full MIF is available via Zenodo: https://zenodo.org/record/4124166. This archive will be updated as new versions of the MIF are released; the DOI will always resolve to the most recent version of the archive. As of this writing, the archive contains three files:
Zenodo pulls from a Github repository that hosts these files. Users are welcome to fork from this repository directly: https://github.com/akthom/sUAS_MIF.
The project was partially funded by an ESIP Lab project grant (2017). The authors wish to thank ESIP, LTER, USGS. All figures were created by Kristina Davis. Thanks to Sarah Elmendorf for feedback. Finally, thanks to all who have attended our workshops, participated in interviews, and otherwise contributed to the development of this framework.
The authors have no competing interests to declare.
A.T., L.B. J.W. conceived of the study. L.B., J.W. provided expert input. A.T., L.B., J.W., S.S. designed surveys and interviews. A.T., S.S. conducted interviews. A.T., L.B., S.S., J.W. analyzed data and refined the MIF. L.B., J.W. implemented and refined MIF. A.T., L.B., J.W., S.S. wrote and reviewed the manuscript.
Lindsay Barbieri, Jane Wyngaard and Andrea K. Thomer contributed equally to this work.
Assmann, JJ, Kerby, JT, Cunliffe, AM and Myers-Smith, IH. 2018. Vegetation monitoring using multispectral sensors — best practices and lessons learned from high latitudes. Journal of Unmanned Vehicle Systems. Publisher: NRC Research Press. DOI: https://doi.org/10.1101/334730
Bachmayer, R, Leonard, N, Graver, J, Fiorelli, E, Bhatta, P and Paley, D. 2004. Underwater gliders: recent developments and future applications. In: Proceedings of the 2004 International Symposium on Underwater Technology (IEEE Cat. No.04EX869), pp. 195–200.
Barbieri, L, Kral, ST, Bailey, SCC, Frazier, AE, Jacob, JD, Reuder, J, Brus, D, Chilson, PB, Crick, C, Detweiler, C, Doddi, A, Elston, J, Foroutan, H, González-Rocha, J, Greene, BR, Guzman, MI, Houston, AL, Islam, A, Kemppinen, O, Lawrence, D, Pillar-Little, EA, Ross, SD, Sama, MP, Schmale, DG, Schuyler, TJ, Shankar, A, Smith, SW, Waugh, S, Dixon, C, Borenstein, S and de Boer, G. 2019. Intercomparison of small unmanned aircraft system (suas) measurements for atmospheric science during the lapse-rate campaign. Sensors, 19(9). DOI: https://doi.org/10.3390/s19092179
Barnas, AF, Chabot, D, Hodgson, AJ, Johnston, DW, Bird, DM and Ellis-Felege, SN. 2020. A standardized protocol for reporting methods when using drones for wildlife research. Journal of Unmanned Vehicle Systems. Publisher: NRC Research Press. DOI: https://doi.org/10.1139/juvs-2019-0011
Buttigieg, PL, Morrison, N, Smith, B, Mungall, CJ, Lewis, SE and ENVO Consortium. 2013. The environment ontology: contextualising biological and biomedical entities. J. Biomed. Semantics, 4(1): 43. DOI: https://doi.org/10.1186/2041-1480-4-43
Gentemann, CL, Scott, JP, Mazzini, PLF, Pianca, C, Akella, S, Minnett, PJ, Cornillon, P, Fox-Kemper, B, Cetinić, I, Chin, TM, Gomez-Valdes, J, Vazquez-Cuervo, J, Tsontos, V, Yu, L, Jenkins, R, Halleux, SD, Peacock, D and Cohen, N. 2020. Saildrone: Adaptively sampling the marine environment. Bulletin of the American Meteorological Society, 101(6): E744–E762. DOI: https://doi.org/10.1175/BAMS-D-19-0015.1
Gries, C, Beaulieu, S, Brown, RF, Elmendorf, S, Garritt, H, Gastil-Buhl, G, Hsieh, H-Y, Kui, L, Martin, M, Maurer, G, Nguyen, AT, Porter, JH, Sapp, A, Servilla, M and Whiteaker, TL. 2021. Data package design for special cases. ver 1. DOI: https://doi.org/10.6073/pasta/9d4c803578c3fbcb45fc23f13124d052
Group, DMW. 2021. Satacite metadata schema documentation for the publication and citation of research data and other research outputs. DOI: https://doi.org/10.14454/3w3z-sa82
Janowicz, K, Haller, A, Cox, SJD, Le Phuoc, D and Lefrançois, M. 2018. SOSA: A lightweight ontology for sensors, observations, samples, and actuators. Journal of Web Semantics. DOI: https://doi.org/10.2139/ssrn.3248499
Lammerding, DM. n.d. Dronetology, the UAV Ontology. http://www.dronetology.net/.
Liolios, K, Schriml, L, Hirschman, L, Pagani, I, Nosrat, B, Sterk, P, White, O, Rocca-Serra, P, Sansone, S-A, Taylor, C, Kyrpides, NC and Field, D. 2012. The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness. Standards in Genomic Sciences, 6(3); 444–453. DOI: https://doi.org/10.4056/sigs.2675953
Margaritopoulos, M, Margaritopoulos, T, Mavridis, I and Manitsaris, A. 2012. Quantifying and measuring metadata completeness. Journal of the American Society for Information Science and Technology. 63(4); 724–737. DOI: https://doi.org/10.1002/asi.21706
Mattben, MH. n.d. CF conventions home page. http://cfconventions.org/.
Mordy, CW, Cokelet, ED, Robertis, AD, Jenkins, R, Kuhn, CE, Lawrence-Slavas, N, Berchok, CL, Crance, JL, Sterling, JT, Cross, JN, Stabeno, PJ, Meinig, C, Tabisola, HM, Burgess, W and Wangen, I. 2017. Advances in ecosystem research: Saildrone surveys of oceanography, fish, and marine mammals in the bering sea. Oceanography, 30(2); 113–115. DOI: https://doi.org/10.5670/oceanog.2017.230
Open Geospatial Consortium. n.d. Observations and measurements. Available at https://www.opengeospatial.org/standards/om. Accessed: 1-31-2019.
Palmer, CL, Thomer, AK, Baker, KS, Wickett, KM, Hendrix, CL, Rodman, A, Sigler, S and Fouke, BW. 2017. Site-based data curation based on hot spring geobiology. PLOS ONE, 12(3); e0172090. DOI: https://doi.org/10.1371/journal.pone.0172090
Rakha, T and Gorodetsky, A. 2018. Review of Unmanned Aerial System (UAS) applications in the built environment: Towards automated building inspection procedures using drones. Automation in Construction, 93; 252–264. DOI: https://doi.org/10.1016/j.autcon.2018.05.002
Taleisnik, S. 2020. Ogc testbed-16: Aviation engineering report. https://docs.ogc.org/per/20-020.html.
Thomer, AK. 2022. Integrative data reuse at scientifically significant sites: Case studies at yellowstone national park and the la brea tar pits. Journal of the Association for Information Science and Technology, 73(8); 1155–1170. DOI: https://doi.org/10.1002/asi.24620
Thomer, AK, Wickett, KM, Baker, KS, Fouke, BW and Palmer, CL. 2018. Documenting provenance in noncomputational workflows: Research process models based on geobiology fieldwork in Yellowstone National Park. Journal of the Association for Information Science and Technology, 69(10); 1234–1245. DOI: https://doi.org/10.1002/asi.24039
Thomer, A, Swanz, S, Barbieri, L and Wyngaard, J. 2020. A minimum information framework the FAIR collection of earth and environmental science data with drones. DOI: https://doi.org/10.5281/zenodo.4124167
Wilkinson, MD, Dumontier, M, Aalbersberg, IJ, Appleton, G, Axton, M, Baak, A, Blomberg, N, Boiten, J-W, da Silva Santos, LB, Bourne, PE, Bouwman, J, Brookes, AJ, Clark, T, Crosas, M, Dillo, I, Dumon, O, Edmunds, S, Evelo, CT, Finkers, R, Gonzalez-Beltran, A, Gray, AJG, Groth, P, Goble, C, Grethe, JS, Heringa, J, Hoen, PACT, Hooft, R, Kuhn, T, Kok, R, Kok, J, Lusher, SJ, Martone, ME, Mons, A, Packer, AL, Persson, B, Rocca-Serra, P, Roos, M, van Schaik, R, Sansone, S-A, Schultes, E, Sengstag, T, Slater, T, Strawn, G, Swertz, MA, Thompson, M, van der Lei, J, van Mulligen, E, Velterop, J, Waagmeester, A, Wittenburg, P, Wolstencroft, K, Zhao, J and Mons, B. 2016. The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3; 160018. DOI: https://doi.org/10.1038/sdata.2016.18
Wyngaard, J, Barbieri, L, Thomer, A, Adams, J, Sullivan, D, Crosby, C, Parr, C, Klump, J, Raj Shrestha, S and Bell, T. 2019. Emergent challenges for science suas data management: Fairness through community engagement and best practices development. Remote Sensing, 11(15). DOI: https://doi.org/10.3390/rs11151797