Global Community Guidelines for Documenting, Sharing, and Reusing Quality Information of Individual Digital Datasets

Open-source science builds on open and free resources that include data, metadata, software, and workflows. Informed decisions on whether and how to (re)use digital datasets are dependent on an understanding about the quality of the underpinning data and relevant information. However, quality information, being difficult to curate and often context specific, is currently not readily available for sharing within and across disciplines. To help address this challenge and promote the creation and (re) use of freely and openly shared information about the quality of individual datasets, members of several groups around the world have undertaken an effort to develop international community guidelines with practical recommendations for the Earth science community, collaborating with international domain experts. The guidelines were inspired by the guiding principles of being findable, accessible, interoperable, and reusable (FAIR). Use of the FAIR dataset quality information guidelines is intended to help stakeholders, such as scientific data centers, digital data repositories, and producers, publishers, stewards and managers of data, to: i) capture, describe, and represent quality information of their datasets in a manner that is consistent with the FAIR Guiding Principles; ii) allow for the maximum discovery, trust, sharing, and reuse of their datasets; and iii) enable international access to and integration of dataset quality information. This article describes the processes that developed the guidelines that are aligned with the FAIR principles, presents a generic quality assessment workflow, describes the guidelines for preparing and disseminating dataset quality information, and outlines a path forward to improve their disciplinary diversity. 2 Peng et al. Data Science Journal DOI: 10.5334/dsj-2022-008 1. BACKGROUND Informed decisions on whether and how to (re)use particular digital datasets rely on knowledge about aspects of data and metadata quality, including their completeness, accuracy, provenance and timeliness (Digital Science et al. 2019; Peng et al. 2021a). Quality assessments also improve the reliability and usability of both data and metadata (Callahan et al. 2017) and are crucial for supporting open-source science and data-driven policy-making processes (Peng et al. 2020a; 2021a). A dataset in this article refers to a collection of data that is identifiable (ISO 19115-1 2014), and has the potential to be curated or published by a single actor (W3C 2020). A particular dataset can digitally represent a group of observations, a data product from a specific version of a processing algorithm based on observations, output of numerical model(s), or outcomes of laboratory experiments. Dataset quality information embodies information about the quality or state of data (input, output, and ancillary), metadata, documentation, software, procedures, processes, workflows, and infrastructure that were created or utilized during the entire lifecycle of a dataset (Peng et al. 2021a). Therefore, the focus of this article is on dataset quality – not just data quality. To be effectively shared and utilized, quality information needs to be consistently curated, preferably traceable, and appropriately documented (Peng et al. 2021a). The granularity of this quality documentation may vary – sometimes be very fine (e.g., per-observation in the case of volunteered observations) but the critical common resolution required to support FAIR data publishing is the individual dataset level.1 Quality assessment results also need to be represented consistently, updated regularly, and should be integrable across systems, services, and tools to enable improved data sharing (Henzen et al. 2021; Wagner et al. 2021; Peng et al. 2021a). While the needs for assessments about the quality of data and related information for a particular dataset are well recognized, an approach for a framework to evaluate and present such quality information to data users (e.g., Figgemeier et al. 2021) may not have been sufficiently developed and/or sufficiently addressed for disciplinary or interdisciplinary use. In response, an international workshop was held virtually on 13 July 2020 to pursue the needs and challenges for preparing and documenting dataset quality information consistently during the complete dataset lifecycle by a group of global Earth science, interdisciplinary domain experts. A number of challenges were identified in Peng et al. (2020b), and three are highlighted below. First, the selection of relevant quality attribute(s) (e.g., accuracy, completeness, relevancy, timeliness, etc.) is largely dependent upon context and can yield multiple quality categories and practical dimensions (Lee et al. 2002; Ramapriyan et al. 2017; Redman 1996; Wang & Strong 1996). This multi-dimensionality makes the assessment of dataset quality a complex endeavor. For example, the quality attribute of completeness can refer to the completeness of data values in both spatial and temporal spaces, or the completeness of metadata elements or content. The multi-dimensionality of dataset quality has been discussed in detail by Peng et al. (2021a). An example of grouping dataset quality into four aspects (i.e., science, product, stewardship, and service) through the entire dataset lifecycle is shown in Figure 1. For each aspect, three important stages are listed along with selected quality attributes which do not constitute an exhaustive list. Those dataset lifecycle stages do not necessarily cover all activities. They may not necessarily happen sequentially, and also may occur in more than one quality aspect. For example, the ‘Evaluate’ part of the lifecycle in the ‘Product’ quadrant may overlap with the ‘Science’ by influencing the ‘Validate’ part. However, generally speaking, activities in the dataset lifecycle identified in the ‘Science’ quadrant occur before those in the ‘Product’ quadrant as noted by the direction of the arrows in Figure 1. Note that the term ‘Develop’ used in the ‘Science’ quadrant also includes data observation/acquisition. The feedback and improvement cycle can occur in any one of the stages. 1 https://www.gbif.org/data-quality-requirements. 3 Peng et al. Data Science Journal DOI: 10.5334/dsj-2022-008 Second, quality attributes are often not defined, measured, or captured consistently, even within one discipline. Moroni and colleagues recently observed such complexity as it pertains to the uncertainty of Earth science data (Moroni et al. 2019). Consistency in defining quality attributes and converging to standardized assessment models may be optimal for sharing, but more progress needs to be made, and whether such consistency is achievable remains to be seen. A step towards cross-domain interoperability, however, may be achieved by thorough documentation of domain-specific quality assessment techniques and metrics and the full provenance of the quality assessment. This allows transformations to be applied to dataset quality scores when this is possible and appropriate, e.g., computation of an exceedance value or quantile from a mean and standard deviation (Bastin et al. 2013, Section 5.1). The third challenge is associated with the paradigm shift in the capabilities of the designated community of scientific data: from domain literate with familiarity of the scientific context and intended use of data products, to potential users representing diverse fields of inquiry (Baker et al. 2016), with increasing demand for machine interoperability. Therefore, the existence of a wide range of stakeholders and data users, including those with very little or no science background, should be considered to facilitate the analysis, interpretation, understanding of research data and related information and in some cases acted upon (Peng et al. 2021a). Any effort to maximize the sharing of quality information requires collaboration among members of the entire community across science, data management, and technology domains. Recognizing that, 32 workshop participants – all international domain experts – issued an open ‘call-to-action for global access to and harmonization of quality information of individual Earth science datasets’ (Peng et al. 2021a). In response to that action call and further motivated by the needs of and interest from the global Earth science community, the International FAIR Dataset Quality Information (FAIR-DQI) Community Guidelines Working Group was formed. Working group members comprise international domain experts, such as data producers and contributors, data managers and curators from scientific institutes and data centers, and data consumers and publishers. Given their common interest in dataset quality information, this group of people can be regarded as a ‘Community of Practice (CoP)’ (E. Wenger-Trayner & B. Wenger-Trayner 2015). Together, the members of this group possess valuable first-hand knowledge and expertise in dealing with the challenges of developing, managing, disseminating, Science

Informed decisions on whether and how to (re)use particular digital datasets rely on knowledge about aspects of data and metadata quality, including their completeness, accuracy, provenance and timeliness (Digital Science et al. 2019;Peng et al. 2021a). Quality assessments also improve the reliability and usability of both data and metadata (Callahan et al. 2017) and are crucial for supporting open-source science and data-driven policy-making processes (Peng et al. 2020a;2021a).
A dataset in this article refers to a collection of data that is identifiable (ISO 19115-1 2014), and has the potential to be curated or published by a single actor (W3C 2020). A particular dataset can digitally represent a group of observations, a data product from a specific version of a processing algorithm based on observations, output of numerical model(s), or outcomes of laboratory experiments.
Dataset quality information embodies information about the quality or state of data (input, output, and ancillary), metadata, documentation, software, procedures, processes, workflows, and infrastructure that were created or utilized during the entire lifecycle of a dataset (Peng et al. 2021a). Therefore, the focus of this article is on dataset quality -not just data quality.
To be effectively shared and utilized, quality information needs to be consistently curated, preferably traceable, and appropriately documented (Peng et al. 2021a). The granularity of this quality documentation may vary -sometimes be very fine (e.g., per-observation in the case of volunteered observations) but the critical common resolution required to support FAIR data publishing is the individual dataset level. 1 Quality assessment results also need to be represented consistently, updated regularly, and should be integrable across systems, services, and tools to enable improved data sharing (Henzen et al. 2021;Wagner et al. 2021;Peng et al. 2021a).
While the needs for assessments about the quality of data and related information for a particular dataset are well recognized, an approach for a framework to evaluate and present such quality information to data users (e.g., Figgemeier et al. 2021) may not have been sufficiently developed and/or sufficiently addressed for disciplinary or interdisciplinary use. In response, an international workshop was held virtually on 13 July 2020 to pursue the needs and challenges for preparing and documenting dataset quality information consistently during the complete dataset lifecycle by a group of global Earth science, interdisciplinary domain experts. A number of challenges were identified in Peng et al. (2020b), and three are highlighted below.
First, the selection of relevant quality attribute(s) (e.g., accuracy, completeness, relevancy, timeliness, etc.) is largely dependent upon context and can yield multiple quality categories and practical dimensions (Lee et al. 2002;Ramapriyan et al. 2017;Redman 1996;Wang & Strong 1996). This multi-dimensionality makes the assessment of dataset quality a complex endeavor. For example, the quality attribute of completeness can refer to the completeness of data values in both spatial and temporal spaces, or the completeness of metadata elements or content. The multi-dimensionality of dataset quality has been discussed in detail by Peng et al. (2021a).
An example of grouping dataset quality into four aspects (i.e., science, product, stewardship, and service) through the entire dataset lifecycle is shown in Figure 1. For each aspect, three important stages are listed along with selected quality attributes which do not constitute an exhaustive list. Those dataset lifecycle stages do not necessarily cover all activities. They may not necessarily happen sequentially, and also may occur in more than one quality aspect. For example, the 'Evaluate' part of the lifecycle in the 'Product' quadrant may overlap with the 'Science' by influencing the 'Validate' part. However, generally speaking, activities in the dataset lifecycle identified in the 'Science' quadrant occur before those in the 'Product' quadrant as noted by the direction of the arrows in Figure 1. Note that the term 'Develop' used in the 'Science' quadrant also includes data observation/acquisition. The feedback and improvement cycle can occur in any one of the stages. Peng et al. Data Science Journal DOI: 10.5334/dsj-2022-008 Second, quality attributes are often not defined, measured, or captured consistently, even within one discipline. Moroni and colleagues recently observed such complexity as it pertains to the uncertainty of Earth science data (Moroni et al. 2019). Consistency in defining quality attributes and converging to standardized assessment models may be optimal for sharing, but more progress needs to be made, and whether such consistency is achievable remains to be seen. A step towards cross-domain interoperability, however, may be achieved by thorough documentation of domain-specific quality assessment techniques and metrics and the full provenance of the quality assessment. This allows transformations to be applied to dataset quality scores when this is possible and appropriate, e.g., computation of an exceedance value or quantile from a mean and standard deviation (Bastin et al. 2013, Section 5.1).
The third challenge is associated with the paradigm shift in the capabilities of the designated community of scientific data: from domain literate with familiarity of the scientific context and intended use of data products, to potential users representing diverse fields of inquiry (Baker et al. 2016), with increasing demand for machine interoperability. Therefore, the existence of a wide range of stakeholders and data users, including those with very little or no science background, should be considered to facilitate the analysis, interpretation, understanding of research data and related information and in some cases acted upon (Peng et al. 2021a).
Any effort to maximize the sharing of quality information requires collaboration among members of the entire community across science, data management, and technology domains. Recognizing that, 32 workshop participants -all international domain expertsissued an open 'call-to-action for global access to and harmonization of quality information of individual Earth science datasets' (Peng et al. 2021a). In response to that action call and further motivated by the needs of and interest from the global Earth science community, the International FAIR Dataset Quality Information (FAIR-DQI) Community Guidelines Working Group was formed.
Working group members comprise international domain experts, such as data producers and contributors, data managers and curators from scientific institutes and data centers, and data consumers and publishers. Given their common interest in dataset quality information, this group of people can be regarded as a 'Community of Practice (CoP)' (E. Wenger-Trayner & B. Wenger-Trayner 2015). Together, the members of this group possess valuable first-hand knowledge and expertise in dealing with the challenges of developing, managing, disseminating,

Stewardship
Brief description of four quality aspects (i.e., science, product, stewardship and service) throughout a dataset lifecycle, three key stages and a few quality attributes associated with each quality aspect (e.g., define, develop, and validate stages for the science quality aspect). The quality aspects and associated stages are based on Ramapriyan et al. (2017)  Since September 2020, the members of this working group have been working collaboratively to develop practical guidelines for data managers and repositories to follow when preparing, representing, and reporting on the quality of individual datasets. These guidelines build on the success of the FAIR Guiding Principles for data sharing (Wilkinson et al. 2016) and on the extensive expert knowledge and practical experiences of working group members, while leveraging community practices. This article describes the development principles and processes, captures the outcomes of this international community effort, and presents a path forward toward enhancing the coverage of disciplines beyond Earth sciences.
This article is organized as follows. A background has been provided in this section. The principles, scope, goals, and intended audience for the development of the guidelines are provided in Section 2, while the development process is described in Section 3. The guidelines developed are presented in Section 4, with a workflow for initiating and carrying out quality assessment, as well as a description of crosswalks to elements of the FAIR Principles. Potential impact of the guidelines, benefits of CoP, and path forward are discussed in Section 5, with a conclusion in Section 6.

2A. DEVELOPMENT PRINCIPLES
The following principles are utilized to guide the development of the guidelines, based on feedback from the Earth science community: i. A holistic dataset life-cycle approach should be adopted for developing guidelines.
ii. Guidelines should be produced in an iterative manner with continuous community engagement for feedback.
iii. Guidelines should be independent of specific quality attributes, assessment types, and context of applications.
iv. Any methodology that is utilized to evaluate certain dataset quality attribute(s) should be findable and accessible, and preferably be interoperable and reusable for both human users and machine users.
v. The assessment results should be openly available findable, accessible, interoperable, and reusable to both human users and machine users.

vi.
Transparent and quantifiable quality assessments should be a part of a dataset quality management framework.
vii. Guidelines should be regularly updated and version controlled.

2B. SCOPE
Given the complexity of dataset quality attributes and different contexts of their fitness for use, the guidelines will focus on providing guidance for capturing and representing dataset quality information consistently, adapting the FAIR Guiding Principles. Preparing such guidance will foster data use by providing users with consistent, timely, and accessible information that is available to effectively make educated data (re)use decisions for their unique application requirements. The guidelines do not focus on what quality attributes, aspects, or dimensions to assess; what assessment models to use; or how to assess dataset quality. However, a basic workflow has been developed, and practical examples are provided as references to help organizations and data stewards get started.
A dataset lifecycle in the context of this article starts at the planning and designing stage of developing a data product (Figure 1). 2 It will not touch on sensor algorithms or model development and deployment. However, it is also important to capture and describe quality 2 It is possible that planning of data products starts long before data are collected, as for satellite missions. Peng et al. Data Science Journal DOI: 10.5334/dsj-2022-008 information such as algorithm model parameters (e.g., accuracy, precision, uncertainty) during these development and deployment stages, because the quality information from these stages is critical for identifying error sources; estimating data product uncertainty (Moroni et al. 2019); and examining error progression to downstream applications (e.g., Matthews et al. 2013).

2C. GOALS
This international community effort has been undertaken to develop guidelines for the Earth science community, in collaboration with international domain experts on data and information quality. The primary objective of the guidelines has been to offer the Earth science community actionable recommendations that can be adopted by a variety of stakeholders to consistently capture, represent, and integrate dataset quality information. Treating dataset quality information as a digital object and being consistent with the FAIR Guiding Principles, improves its potential for sharing and reuse with more targeted practicality. Care was taken so that the guidelines would be general enough to be readily adopted or adapted by other research science communities. The optimal goal is to foster global access to and harmonization of quality information of datasets as a critical step towards facilitating open-source science in both machine-and human-friendly environments as called for by Peng et al. (2021a).

2D. INTENDED AUDIENCE
All data stakeholders may benefit from the community guidelines: -Data producers will find these useful to ensure at the point of acquisition that critical attributes are captured. Such attributes will later be used to ascertain the quality of the data they are capturing (e.g., uncertainty of location/measurements, instrument parameters, metadata attributes on the instrument used to acquire the data).
-Data publishers and data curators may find the community guidelines valuable for improving the quality information associated with the data that they publish and manage.
-Sponsors and funders may find the guidelines helpful when reviewing data management plans in proposals for the support of projects and programs that will be creating, curating, disseminating, and supporting the use of data. They will also find them useful during the project closure phase when assessing the quality of the data products generated against the initial project goals and data management plans.
-Data users may find that the guidelines improve their understanding of quality issues when determining whether a particular data product or service is appropriate for their intended use and what the limitations may be for using the data. This could support the application of 'confidence levels' to certain information derived from the data.

DEVELOPMENT PROCESS: TIMELINES AND WORKFLOW
This section provides a detailed description of the process of developing a framework through an international collaboration with the expectation that it will be useful for other groups or communities that may be considering similar endeavors.
The idea of potentially developing a framework for consistently capturing quality information for enabling the use of Earth science datasets was initiated in September 2019 (Figure 2). Follow-on discussions on community needs and the prospect of developing community guidelines for documenting and reporting dataset quality information as described in Peng et al. (2020b), were carried out among several groups across the globe. These groups include the Earth Science Information Partners (ESIP) Information Quality Cluster ( Support from the ESIP leadership was committed in early 2020 to sponsor a whole-day, inperson, international workshop prior to the ESIP 2020 summer meeting (SM20) with an additional report-out session during the SM20. The goal of the pre-ESIP workshop was to convene international domain experts to kick off the development of the guidelines by exploring the needs, challenges and current state of documenting and reporting dataset quality information.
Invitations for participation were sent to prospective collaborators.
In the wake of the COVID-19 pandemic, the in-person workshop was changed to a virtual event, allowing it to be extended to a wider audience. A case statement was drafted and published to help set the stage and communicate the effort (Peng et al. 2020a). The workshop website 3 was established to host the workshop materials and additional resources (Figure 3).

Figure 2
Schematic diagram of timelines of the initiation, planning, development, community review, and first baseline of the guidelines document. The guidelines document will be updated in the future to improve its coverage in diverse disciplines. About 80 ESIP and invited international domain experts, affiliated with over 40 private, academic and governmental institutions from nine countries within North America, Oceania, and Europe registered for the workshop (Peng et al. 2020b). Two live 90-minute virtual workshop sessions were held on July 13, 2020, to accommodate attendees from different time zones. More than 45 workshop registrants attended the first live session while approximately 25 attended the second. About 45 ESIP SM20 registrants attended the subsequent report-out session. Prior to this workshop, a mini-workshop had been held by AU/NZ DQIG on July 6, 2020, where 57 had registered and 27 participated actively.
Eleven invited speakers presented during the two virtually live workshop sessions and additional three presented at the 90-minute report-out session during SM20. Invited speakers represented diverse international organizations, including major international space agencies and satellite programs, such as the NOAA Joint Polar Satellite System (JPSS) program (Goldberg & Zhou 2020), European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) (Schulz 2020), and European Space Agency (ESA) (Albani & Maggio 2020). Presentations described data stewardship activities at global organizations, such as the Group on Earth Observations (Downs 2020) and the World Meteorological Organization (Lief et al. 2020); as well as major national Earth science data and service centers, including those for NASA (Wei et al. 2020 Table 1 of Peng et al. 2020b, for the full list of presentation titles, affiliated organizations, and citations).
The speakers shared their knowledge to help participants ascertain the complexity and multidimensionality of curating dataset quality information. This knowledge exchange allowed participants to understand why Earth science organizations need to prepare and describe data quality information throughout the entire dataset lifecycle -covering stages from data product design and production, through data and metadata curation for preservation and access, to data use by servicing data to consumers. It also helped attendees appreciate the challenges those organizations face and learn about the different approaches taken. These informative presentations provided perspective for productive discussions among participants during the live sessions. Notes were recorded online in a collaborative Google Doc and offline discussions continued following the workshop during the two weeks of the virtual SM20. For many of the over twenty non-US pre-ESIP workshop attendees, this was their first time engaging with the ESIP community (Peng et al. 2020b).
The strong need for practical guidelines was recognized as an opportunity to provide the community with guidance to improve data sharing by consistently preparing and representing information about the quality of datasets. The absence and limitations of currently available guidance also was recognized (Peng et al. 2021a). Participants of both the pre-ESIP workshop and the subsequent SM20 session have stressed the need for such guidelines to be created by the community and for the community through an iterative process with community feedback (Peng et al. 2020b).
Several community calls to participate voluntarily in an international working group were announced during the pre-ESIP workshop and the subsequent SM20 session, along with messages to relevant Earth science email lists, including the ESIP community list. Since September 2020, over twenty international domain experts have joined the working group, which has begun developing the guidelines by consolidating community recommendations first (Figures 2 and 3). A white paper on the guidelines was published for community review in April 2021 (Peng et al. 2021b, version 3). Extensive outreach was conducted by working group members to share the initial draft of the guidelines document with the Earth science and geospatial data community (e.g., Lacagnina et al. 2021a-b;Peng et al. 2020c;Peng et al. 2021c-i;Wyborn et al. 2021). The guidelines document, partially reproduced below, has since been revised to release the first baseline version, which reflects community comments and suggestions ).

FAIR DATASET QUALITY INFORMATION GUIDELINES
In this section, we first define a basic workflow with relevant elements to consider when setting out to assess dataset quality and curate quality information. A set of the guidelines developed by the International FAIR-DQI Community Guidelines Working group are then presented, followed by crosswalks between the guidelines to the FAIR Guiding Principles. Peng et al. Data Science Journal DOI: 10.5334/dsj-2022-008

4A. BASIC WORKFLOW FOR CURATING DATASET QUALITY INFORMATION
While assessing dataset quality is multi-dimensional (Peng et al. 2021a), there are common aspects. Knowledge about these common aspects may help to set the direction for the right approach in each specific case of assessing quality and reporting assessment results.
To help organizations and data stewards address the challenge of where to start when curating and reporting dataset quality information, we have developed a typical workflow (Figure 4). This approach is inspired by the quality evaluation procedures defined in ISO 19157 (2013) and Six Sigma (e.g., Cordy & Coryea 2006), and follows the steps outlined below to define, measure, analyze, and improve, as presented in Lee et al. (2002) for organizing data quality management.
The workflow highlights some of the basic ingredients and elements to be considered at each step when curating dataset quality information. We add the dissemination, a.k.a. 'reporting' in ISO 19157 (2013), of dataset quality information, which is becoming an increasingly important task for building trust between data providers and end-users and for improving data usability. 4 As shown in Figure 4, the following two steps are needed prior to carrying out any assessment activity.
Step 1: Quality specification -Curating dataset quality information should start with defining the quality attribute(s), aspect, or dimension that will be assessed, determining the level of granularity (variable, ensemble member, model or algorithm), and identifying which data and quality attribute should be prioritized. This step will need some profiling, that is, an initial analysis of the available data to understand the challenges and the most critical issues to set priorities and determine the appropriate strategy to deploy (e.g., Cosoli & Grcic 2019; Woo & Gourcuff 2021).
Step 2: Evaluation specification -The next step involves identifying or developing an approach (or method) to evaluate the identified quality attribute(s) or assess its maturity. Example approaches could include a statistical analysis approach (Wu et al. 2017) or a scientific maturity matrix (Zhou et al. 2016). In this step, the framework for the evaluation is defined. It is important to describe the identified quality attribute or dimension, the evaluation method used, and the protocols, standards and workflows applied (e.g., Cosoli & Grcic 2019; Lemieux  Popp et al. 2020;Woo & Gourcuff 2021;Wu et al. 2017;Zhou et al. 2016). A well-documented quality evaluation helps to increase transparency, verifiability, reproducibility, and resilience of the quality evaluation process.
The next two steps are important to capture and convey the resultant quality information.
Step 3: Evaluation execution -During this stage, the actual assessments are performed based on the tools, approaches and priorities defined in the previous steps. While doing this, the assessments should be captured in structured, human-and machine-readable, and standard-based formats (e.g., Heydebreck et al. 2020;Peng et al. 2019a).
Step 4: Quality dissemination -The results of the assessments represent the core of the dataset quality information and need to be disseminated with the data for the benefit of end-users. Feedback from users on data quality is beneficial to data producers to initiate data improvement processes. For reproducibility purposes, it is recommended that the operations performed to produce the quality information also be published (e.g., Davies & Sommerville, 2020). In this step, the mechanism for quality information dissemination (e.g., metadata, web page, API) is implemented and put into practice.
Finally, feedback from users on dataset quality information should be sought and evaluated to improve the quality information provided along with how the information is disseminated.
Step 5: Monitoring and improvement -The feedback collected in the previous step and the experience gained during the assessments are rationalized to consider improvements of the protocols, tools, and approaches and to redefine priorities in the assessment process (e.g., Cosoli & Grcic 2019; Wu & Gourcuff, 2021). This step is completed continuously throughout the assessment to dissemination steps, as it helps to improve the curation of quality information.

4B. GUIDELINES FOR ENABLING FAIR DATASET QUALITY INFORMATION
The following five guidelines are developed by the International FAIR-DQI Community Guidelines Working Group to enable curated dataset quality information to be FAIR (i.e., findable, accessible, interoperable, and reusable), for both human users and machines. A description of crosswalks to relevant elements of the FAIR Principles, which are denoted as F1-F4 for Findable, A1-A2 for Accessible, I1-I3 for Interoperable, and R1 for Reusable, is provided (see Wilkinson et al. 2016 for the definitions of the FAIR Principles). This guideline aims to ensure that the underlying dataset is findable, comprehensively described, and potentially reusable by cross-walking to all the F1-F4 principles of Findability, and the R1 (rich metadata with a plurality of relevant attributes) and R1.1 principles (data usage license) of Reusability, either directly or indirectly, denoted by solid and dashed lines in Specifically, having a dataset PID leads to satisfying F1 (data are assigned a unique and persistent identifier). Given the nature of PID and the required landing page ensures that the (meta)data are indexed and resolvable (F4). To have a comprehensive landing page of a dataset, both data and metadata need to be described with numerous pertinent attributes, which leads to satisfy F2 (data are described with rich metadata) and R1 principles, respectively. Including a usage license leads to supporting the R1.1 principle. Peng et al. Data Science Journal DOI: 10.5334/dsj-2022-008 The current common practice is to include the data PID in the metadata (F3) as part of the process of assigning and minting that PID. If the data PID is minted by a service provider such as DataCite, metadata should continue to be accessible even beyond the availability of the data (A2). However, since it is largely up to practices implemented by individual organizations, it yields only an indirect crosswalk from the guideline 1 to these two FAIR principles (F3, A2). This guideline aims to ensure that the assessment model is searchable and retrievable (Figure 5) protocol in 2.3 lead to satisfying A1.2. The versioning itself is far short of the information required for assessment of model provenance. However, it helps support provenance (R1.2). Therefore, an indirect crosswalk to R1.2 is indicated in Figure 5.
Examples of existing dataset quality assessment models and their compliance with Guideline 2 are provided in Minimally, a published paper (with DOI) that describes a quality assessment model is necessary to provide access to the model. We highly recommend publishing the assessment model itself (with DOI), for example, in one of the aforementioned repositories. A project website tends to be a common place currently, but is often not sustainable or persistent due to the limited lifespan of projects. For example, a broken link as a result of organizational system migration will lead to inaccessibility of the assessment model.

Guideline 3:
Capture the quality attribute(s)/aspect(s)/dimension(s), assessment method and results in a dataset-level metadata record using a consistent framework/schema that: 3.1. is semantically and structurally consistent and follows community standardspreferably compliant with national or international metadata standards that satisfy the conditions of Guideline 2 (i.e., 2.1-2.3),

3.2.
includes a description of the quality attribute(s), aspect(s), or dimension(s) to be assessed,

3.3.
includes a description of the assessment method and assessment model structure and version, and access date if applicable,

includes a description of the assessment results, and
3.5. includes versioning and the history of the assessments.
This guideline aims to ensure that the quality information is captured or referenced in the dataset metadata and that it is findable, accessible, interoperable, and reusable by machine end-users (Figure 5). Utilizing a metadata framework/schema that satisfies the conditions 2.1-2.3 of Guideline 2 ensures that it is findable and accessible.
The requirements of capturing quality entity (i.e., attribute, aspect, or dimension), assessment method and results and that in 3.1 help ensure that the dataset-level metadata is richly described (R1) following metadata standards (R1.3) and is machine interoperable (I1). Capturing the assessment method is often accomplished by referencing it in the metadata record, which satisfies I3; as is capturing assessment results in the form of a published report.
Specifically, including a description of the information related to assessments, that is, quality entity, method, and results as required in 3.2-3.5, leads to rich metadata with a plurality of relevant attributes (R1). The semantically and structurally consistent metadata record that is compliant with standards (3.1) and crosswalks to I1 and R1.3. It may also potentially meet the requirements of I2 (FAIR-compliant vocabularies) in a best-case scenario but may fall short in most of the cases, so only a weak mapping is denoted by the dashed line (Figure 5). The requirements in 3.5 support the provenance of the assessment results (R1.2).
Examples of existing approaches in representing quality entities, assessment models and assessment results in machine-readable quality metadata and their compliance with Guideline 2 are provided in Table 2. Additional examples can be found in Peng et al. (2021b).
Adopting or adapting (including information about the adaptation) existing quality metadata frameworks also is recommended. If that is not possible, a new quality metadata framework or schema may be developed. In this case, the framework should have the capability to allow for requirements in 3.1-3.5 to be satisfied.
Using a consistent metadata tag and including it in a schema is recommended, if applicable. For example, Peng et al. (2019a) uses MM-Stew as a metadata tag to denote stewardship maturity assessment. Once the new schema is stable, registering it with schema.org or other relevant metadata schema host entities, such as DataCite, is recommended.
Guideline 4: Describe comprehensively the assessment method, workflow, and results in at least a human-readable quality report that:

-Semantically and Structurally Consistent
Yes Yes Yes

-Metadata Framework/Schema
International Domain Domain

-Quality Entity Description
Yes Yes Yes

Yes Yes
Partly (contains evaluation of quality description and not description of quality assessment)

-Assessment Results Description
Yes Yes Yes Table 2 Examples of representing quality entities, assessment models and assessment results in machine-readable quality metadata and their compliance with Guideline 3.

Yes Versioning Creation & Last Update Dates
13 Peng et al. Data Science Journal DOI: 10.5334/dsj-2022-008 This guideline aims to at least ensure the quality information is findable, accessible, citable, reusable and understandable to human end-users ( Figure 5). However, we strongly encourage quality reports to be also machine readable.
Comprehensively describing the relevant information yields human-readable metadata with multiple attributes (R1: richly described metadata). Publishing the assessment report following the criteria 2.1-2.3 with an explicit open license (4.2) leads to F1 (PID), F4 ((meta)data registered in a searchable resource), A1 ((meta)data retrievable via standardized protocol), and R1.1 (clear data usage license). The inclusion of the report history (4.2) supports R1.2. Linking the report PID to the dataset-level metadata record (4.3) satisfies the F3 (PID in metadata) and I3 (references to other metadata) principles, respectively.
Examples of existing approaches in representing quality entities, assessment models, and assessment results in human-readable quality reports and their compliance with Guideline 4 are provided in Table 3. Additional examples can be found in Peng et al. (2021b).
Guideline 5: Report/disseminate the dataset quality information in an organized way via a web interface with a comprehensive description of: This guideline aims to ensure that the quality information is online and comprehensively described, findable, and easily understood and trusted by providing the assessment provenance ( Figure 5).
A comprehensive description of the dataset (requirement 5.1), the assessed quality attribute/ aspect (5.2), the evaluation method (5.3), and how to understand and use the quality information (5.4) leads to rich metadata with a plurality of relevant attributes (F2 and R1). The nature of reporting or disseminating and being online indicates it is retrievable via a standardized communication protocol (A1).
Examples of existing approaches in representing assessment results online and their compliance with Guideline 5 are provided in Table 4. Additional examples can be found in Peng et al. (2021b).
There is a large diversity in current approaches to disseminate data and metadata quality information because of the dependency on the knowledge-base of the designated community for data. Data users should provide feedback on which disseminated quality information is most relevant and how it can be improved. Therefore, user engagement activities are quite relevant at this stage, including prompt responses to questions and suggestions received from users.

-Quality Entity Description
Yes Yes Yes

-Assessment Method Description
Yes Yes Yes

-Assessment Results Description
Yes Yes Yes

-License
Yes Yes Yes

-Assessment History
Yes Yes Yes Table 3 Examples of humanreadable dataset quality assessment reports and their compliance with Guideline 4.

-Linked Report PID Yes No Yes
14 Peng et al. Data Science Journal DOI: 10.5334/dsj-2022-008 Likewise, it is also recommended to convey dataset quality information in a manner that is easily understood and usable by data users and provide a mechanism for user feedback. 678

DISCUSSION
This section provides a brief discussion of the potential impact of the guidelines provided above, benefits of CoP, and the path forward to increasing community awareness of the guidelines and promoting their adoption.

5A. POTENTIAL IMPACT OF THE GUIDELINES
Improving practices for documenting, sharing, and reusing information about the quality of datasets will help advance scientific progress and contribute to societal benefits through opensource science. When dataset quality information enables potential users to discover a dataset and determine whether it is appropriate for an intended use, FAIR data quality information also helps to achieve FAIR data (Peng et al. 2021a). Likewise, when information describing the quality of a dataset fosters its interoperability and reusability, the guidelines further help to make the data FAIR. Those elements of the guidelines which focus on documentation of quality assessment strategies have the additional potential to make FAIR not just the data, but also those evaluation processes. This articulation and communication of domain-specific models, protocols and assumptions can support robust interdisciplinary re-use of data.
In addition, adoption of the guidelines for dataset quality information by the Earth science community, as well as by other disciplinary communities, offers an opportunity to improve the trust that potential users have in the underlying datasets. From a user's perspective, finding relevant, trusted data is critical to driving decisions. By improving practices for documenting, sharing, and reusing information about the quality of datasets, data providers and users will have increased confidence and improve consistency when disparate datasets are accessed, overlayed, and shared to drive impact-based decisions. These guidelines can assist in establishing trusted approaches for enabling diverse in-situ observing platforms to be used with confidence when assessing, for example, water quality information in estuaries, rivers, bays, and oceans when those sensors may have been installed and funded by different state and federal agencies.
Furthermore, providing sufficient information, including quality information, for using datasets within data collections has the potential to improve trust in the data repositories that are responsible for curating and sharing data (Lin et al. 2020). Clearly, community guidelines for dataset quality information would also benefit disciplines beyond the Earth sciences and efforts are underway to increase their discipline diversity.

-Report information in an organized way
Yes Yes Yes

-Dataset Description
Minimal Yes Minimal

-Assessed Quality Entity Description
Yes Yes Yes

-Evaluation Method and Review Process Description
Yes Yes Yes

-Description of How to Understand and Use Description
Some Some Minimal

5B. BENEFITS OF A COMMUNITY OF PRACTICE (COP)
With common interests and passions about sharing quality information, the members of the International FAIR-DQI Community Guidelines Working Group have come together to essentially form a loosely organized CoP. The development of the guidelines benefited from the common advantages of a CoP. These include knowledge sharing on needs, challenges, and practices in curating and representing quality information from diverse Earth science domains. There are also added benefits of participating in a CoP throughout the development process. Two will be highlighted below.
One is that we are all learning together. Knowledge about other perspectives broadens our own point of view that comes from our own experiences. A large part of developing knowledge is developing consensus through learning from each other.
Another is that we bring what we have learned back to our jobs, organizations, and communities. Changing is a long process of learning, accepting, and adapting -the first and hardest part is culture change. The subtle changes we make through knowledge we learned can become the seeds that lead to much-needed culture change in our organizations and communities towards sharing quality information at large.

5C. PATH FORWARD
The guidelines should help organizations and data stewards get started on providing dataset quality information to data consumers -an important step to close the chasm between data producers and users. However, adoption often requires culture change, which demands continued engagement with the Earth science community (e.g., Lacagnina et al. 2021a).
The effective sharing and (re)use of dataset quality information needs cross-disciplinary integration. Efforts are underway to engage and collaborate with other communities and disciplines beyond Earth science, such as: • Open Geospatial Consortium (OGC; Ivánová et al. 2021 -OGC Data Quality Workshopcitizen science, Earth science, geospatial science, machine learning, social science, urban planning), • World Data System  -SciDataCon session -astronomy, citizen science, Earth science, social science), and • Research Data Alliance (RDA) (Peng et al. 2021h -RDA18th Plenary session -astronomy, Earth science, genomics, social science). Activities are underway towards forming an RDA working group on making dataset quality information FAIR for the RDA community.
It has been pointed out by the community during our on-going engagement that it will be beneficial to develop and provide use cases for data quality and implementation of the guidelines. The OGC Data Quality Domain Working Group (OGC DQ DWG) 9 is currently working towards the development of a catalog of data quality use cases and we will be contributing to the effort.

CONCLUSION
The FAIR Guiding Principles described by Wilkinson et al. (2016) provide a succinct and measurable set of concepts to be used as a guideline for improving the access and reusability of data for human users and machines. Although the FAIR Principles have provided an effective way to enable data sharing, they do not explicitly describe how dataset quality information should be curated and shared.
Inspired by the FAIR Guiding Principles, a set of guidelines for curating and reporting dataset quality information were developed for both human users and machines, as a global community effort. The guidelines development effort was carried out by a Community of Practice through an iterative process guided by community feedback. The process of developing the guidelines has been described, which may be of use to inspire similar activities requiring large community consensus and uptake. Peng et al. Data Science Journal DOI: 10.5334/dsj-2022-008 The guidelines aim to improve the availability and usability of quality information at the individual digital dataset level. Utilizing a structured quality assessment model helps to ensure the consistency of evaluation methods and results, which in turn will make it easier to capture them consistently. Capturing the assessment results in the dataset-level metadata using a consistent framework improves machine interoperability and supports integration across systems and tools. Disseminating the dataset quality information in a transparent and userfriendly way will help end users to understand and effectively use or integrate the information.
Community guidelines developed as a result of this effort bring the Earth science community one step closer to standardizing the curation and representation of dataset quality information.
The guidelines described in this article offer opportunities to enable or improve the transparency and interoperability of dataset quality information. Adopting all or part of the guidelines can contribute to the ecosystem that supports open-source science. An excellent byproduct of streamlining the curation and representation of dataset quality information is the improved likelihood of automating the curation and reporting process, leading to international access to and usability of information about the quality of individual digital datasets (Peng et al. 2021a).
Utilizing the guidelines also helps improve the overall FAIRness of a dataset by providing community-standard-based rich metadata with a plurality of relevant quality attributes and qualified references. It establishes the trustworthiness of data and ultimately improves the maturity of a dataset in multiple quality dimensions or aspects including product, stewardship, and services by improving the completeness and usability of metadata and documentation.
The international FAIR-DQI community guidelines document ) is a living document and is expected to evolve over time to accommodate user feedback and emerging community best practices. As indicated in Section 5c, use cases will be developed, in collaboration with OGC DQ DWG, to further improve the maturity and comprehensiveness of the guidelines and provide implementation examples for the global Earth science and geospatial community. Furthermore, in collaboration with the RDA community, an effort is underway to improve the discipline diversity of the guidelines.