Start Submission Become a Reviewer

Reading: Stewardship Maturity Assessment Tools for Modernization of Climate Data Management

Download

A- A+
Alt. Display

Research Papers

Stewardship Maturity Assessment Tools for Modernization of Climate Data Management

Authors:

Robert Dunn ,

Met Office Hadley Centre, Exeter, GB
X close

Christina Lief,

ret. NOAA, US
X close

Ge Peng,

Cooperative Institute for Satellite Earth System Studies between NOAA and North Carolina State University, at NOAA National Centers for Environmental Information, Asheville; Now at Earth System Science Center/NASA MSFC IMPACT, The University of Alabama in Huntsville, US
X close

William Wright,

ret Australian Bureau of Meteorology, AU
X close

Omar Baddour,

World Meteorological Organization, Geneva, CH
X close

Markus Donat,

Barcelona Supercomputing Center, Barcelona, ES
X close

Brigitte Dubuisson,

Météo-France, Direction de la Climatologie et des Services Climatiques, Toulouse, FR
X close

Jean-François Legeais,

Collecte Localisation Satellites (CLS), Ramonville St-Agne, FR
X close

Peter Siegmund,

Royal Netherlands Meteorological Institute, De Bilt, NL
X close

Reinaldo Silveira,

Sistema Meteorologico do Parana, SIMEPAR, BR
X close

Xiaolan L. Wang,

Climate Research Division, Environment and Climate Change Canada, Toronto, CA
X close

Markus Ziese

Deutscher Wetterdienst, Offenbach a.M., DE
X close

Abstract

High quality and well-managed climate data are the cornerstone of all climate services. Consistently assessing how well the data are managed is one way to establish or demonstrate the trustworthiness of the data. This paper presents the World  Meteorological Organization’s (WMO) Stewardship Maturity Matrix for Climate Data (SMM-CD) and the subsidiary SMM-CD for National and Regional Purposes (SMM-CD_NRP). Both these matrices have been developed with the support of the WMO and its High-Quality Global Data Management Framework for Climate (HQ-GDMFC). These self-assessment tools enable data managers to discover WMO recommended data stewardship practices, determine a roadmap for future development and improvement, as well as compare their process against other data providers. Datasets which have been maturity assessed are included in the WMO Climate Data Catalogue, where users can include the results of these maturity assessments into their decision-making process. The SMM-CD contains four categories (data access, usability and usage, quality management, and data management) each of which has a number of aspects, with scores assigned to one of five levels. A smaller number of categories in the SMM-CD_NRP are assigned to four levels appropriate for operationally produced datasets which are national or regional in scope. We explore a number of case studies where these matrices have been applied, as well as supply links to where the Guidance Documents and Assessment Templates (which may be updated) can be found.

How to Cite: Dunn, R., Lief, C., Peng, G., Wright, W., Baddour, O., Donat, M., Dubuisson, B., Legeais, J.-F., Siegmund, P., Silveira, R., Wang, X.L. and Ziese, M., 2021. Stewardship Maturity Assessment Tools for Modernization of Climate Data Management. Data Science Journal, 20(1), p.7. DOI: http://doi.org/10.5334/dsj-2021-007
496
Views
99
Downloads
9
Twitter
  Published on 09 Feb 2021
 Accepted on 20 Jan 2021            Submitted on 30 Sep 2020

1 Introduction and Background

All climate services, from data provision through seasonal climate forecasting, to the monitoring of, and adaptation to, climate variability and change, as well as disaster risk reduction depend on high quality and well-managed climate data. Among the numerous challenges to the implementation of quality climate services at both the global and national level is that much of the existing guidance on climate data management struggles to keep pace with the rapid advances in technologies, current community best practices and user requirements. Although currently there are opportunities to benefit from these advances, in many cases there is not the capacity to perform good data management, which is further hindered by unstandardized terminology and no suitable regulatory framework. Hence, it is important to ensure that a robust regulatory framework that defines standard and recommended practices and procedures for management of the data exists and is agreed internationally.

The World Meteorological Organization (WMO), as the recognised global authority on weather, water and climate, has sought to address these issues by developing a High Quality Global Data Management Framework for Climate (HQ-GDMFC, WMO, 2014) to enable the effective development of stewardship processes and also the exchange of high-quality climate data, based on a reliable, integrated, underpinning data infrastructure at the global, regional and national levels. The HQ-GDMFC contains the following three components (Figure 1):

Figure 1 

A schematic diagram showing three components of the WMO High-Quality Global Data Management Framework for Climate (HQ-GDMFC).

  1. A Manual on HQ-GDMFC, which establishes the regulatory framework and recommended best practices around climate data management, applicable to all entities with a mandate to manage climate data;
  2. A Stewardship Maturity Matrix for Climate Data (SMM-CD), which is an evaluation tool for quantitatively assessing the level of stewardship of climate data, which may be thought of as the level of capability exhibited in managing climate datasets;
  3. A catalogue of climate datasets, both international datasets and those of individual National Meteorological and Hydrological Services (NMHS) or other entities, that have been through the assessment process.

This is the first time that the climate community has established such a regulatory framework specifically for the management of climate data, although other such frameworks have previously been proposed by collaborative projects (e.g. the CORE CLIMAX maturity matrix [EUMETSAT, 2013]) or used in the national context (e.g. the NOAA data stewardship maturity matrix, [Peng et al., 2015]), outlined below.

The HQ-GDMFC promotes establishment of, and compliance with, standards and recommended practices for sourcing, securing, managing, assessing, and cataloguing climate data, and for sharing infrastructure such as for data exchange, analysis and data service provision. The manual describing these standards and recommended practices (WMO, 2019b) was adopted by WMO (WMO, 2019a) as part of the WMO Technical Regulations. As outlined in this HQ-GDMFC manual (WMO, 2019b), the scope of international collaboration within HQ-GDMFC is based on a set of principles:

  1. Promoting adherence to relevant WMO data policies;
  2. Registering datasets to be shared internationally for use in climate studies, monitoring and applications;
  3. Facilitating easy access to metadata and documentation underpinning the datasets;
  4. Promoting preservation and sound, standards-based management of all data that are used, or may potentially be useful for, climate-change monitoring, including backing up in duplicate repositories for the duration of their specified retention periods;
  5. Assessing and improving the maturity and quality of stewardship practices underpinning the datasets, cataloguing them for easy search, discovery and access, and promoting their use in informing policy-relevant frameworks; and
  6. Promoting acquisition of user feedback on the quality, fitness for purpose and usability of shared datasets.

The focus of this manuscript is the Stewardship Maturity Matrix for Climate Data (SMM-CD) and the subsidiary SMM-CD for National and Regional Purposes (SMM-CD_NRP). We outline herein the creation and use of these Maturity Matrices which assesses aspects of data stewardship. Or, simply put, how well has the dataset been created and curated to ensure the accessibility, usability and integrity of the data, and sufficient documentation for data users. It will necessarily be limited to those facets which can be (independently) assessed. These matrices do not explicitly assess the scientific rigour involved in creating the dataset, e.g., how reliable the underpinning observations are, details of processing, homogenisation, scientifically-based adjustments etc. What they do provide is information on the extent to which the dataset has clear documentation, support channels, is constructed with clear coding practices, applies quality control and assurance procedures, provides uncertainties, and adheres to data format and archiving standards. Datasets fulfilling these criteria may well contain reliable information and be supported and available over a long period. But it is important that users of datasets which have been assessed use this information in combination with other sources to make an appropriate choice for their application.

We outline the rationale behind the construction of a Stewardship Maturity Matrix in Section 2, as well as the process which lies behind the SMM-CD in Section 3. We detail the SMM-CD as well as the SMM-CD_NRP for national and regional purposes in Sections 4 & 5. The datasets assessed so far as well as some case studies are presented in Section 6 and we summarise in Section 7. Links to the current SMM-CD and SMM-CD_NRP documents along with other supporting information are provided at the end of this manuscript.

2 Rationale for Making a Stewardship Maturity Matrix

A data stewardship maturity assessment model in the form of a matrix can be used not only as a guide to users about the rigour of data stewardship practices, but also as a tool for monitoring and improving aspects of organizational performance in producing, managing, or servicing climate data. It is typically presented as a two-dimensional matrix. The rows identify the various facets of core stewardship functionality, (e.g., data management), while the columns describe typical behaviours representing increasing maturity in practices and capability against each aspect, ranging from a poorly-managed or no-capability state to an advanced, well-managed state.

A number of maturity matrices that can be applied to climate data already exist (e.g., NOAA/NCEI Data Stewardship Maturity Model (DSMM) (Peng et al., 2015) and the EUMETSAT Core-Climax Production System Maturity Matrix (SMM) (EUMETSAT, 2013) to evaluate the maturity of various data quality attributes. The CORE-CLIMAX SMM measures the maturity of the systems that produce datasets of essential climate variables (ECVs) while DSMM measures the maturity of how digital datasets are being managed within the context of the open archival information systems (OAIS). Both are important to ensure and improve overall quality of climate datasets to users and policy makers. A WMO-developed and supported matrix has the advantage of ensuring a mandate on current best practice, applying to all countries, nations and territories. Through the effort of the International Expert Group on Climate Data Modernisation (IEG-CDM), the WMO has developed and baselined the SMM-CD (Peng et al., 2019), leveraging community best practices and standards captured in those existing maturity assessment models to help ensure and improve the trustworthiness of climate datasets in the WMO data catalogue.

Along with the domain specific approaches presented in the two matrices above, there are other data stewardship principles which have relevance to the SMM-CD. For example, the FAIR (Findable, Accessible, Interoperable and Reusable) Guiding Principles (Wilkinson et al., 2016) are fundamental to machine-enabled data sharing. Furthermore, the FAIR Data Maturity Indicators (DMI) endorsed by the Research Data Alliance (RDA) (Bahim et al, 2020) provides implementation guidance on what indicators to assess for “FAIR-ness”. The TRUST (Transparency, Responsibility, User focus, Sustainability, Technology) principles (Lin et al., 2020) describe sustainability and data stewardship requirements for repositories for long term FAIR-ness (collected together at Core Trust Seal – http://www.coretrustseal.org).

The SMM-CD described herein goes beyond assessing only FAIR-ness, by also evaluating the maturity of other aspects of data stewardship practices within the scope of the OAIS Reference Model. As the SMM-CD is domain specific, and has been developed for WMO Member countries, it tries to capture the current stewardship practices applied to individual datasets of Earth Science systems. Providing data stewardship maturity information utilizing the SMM-CD primarily supports the Transparency, Responsibility and User focus aspects of the TRUST principles for data repositories (Peng et al., 2020).

The SMM-CD also focusses on individual datasets, though these may be hosted on repositories which meet the CoreTrustSeal requirements or follow the TRUST principles. An independent effort is underway to examine the synthesis among DSMM, CoreTrustSeal repository requirements, and FAIR data principles (Peng et al., in prep.) but it is beyond the scope of this paper.

2.1 Using a Stewardship Maturity Matrix

The availability of a WMO-led maturity matrix allows data stewards (e.g. in National Hydrological and Meteorological Services [NHMSs]) to assess their data management practices in an internationally standardised framework, identifying gaps and other elements of their processes that would benefit most from improvement. It also allows the identification of a target level of stewardship maturity for the data they are managing, appropriate to the use-cases and resource level available, as well as a roadmap to measure progress on improving information management capability in support of WMO Programmes.

Once the stewardship maturity of a dataset has been assessed across all aspects, and scores are available, there are a number of different ways both data managers and users of the datasets can use this information. For data managers, having an independent set of assessments across a number of aspects could be useful in identifying where to focus limited resources in improving stewardship quality. There may be some “quick wins” where higher ratings for some aspects require little effort to obtain (and may even be achieved during the assessment process). Furthermore, by contrasting ratings against other similarly well-managed products, the scores from the SMM-CD may even help prioritize cost planning, resource allocation and funding for future data management with the aim of improving stewardship maturity for those datasets. Dataset creators can use the scores similarly when outlining major updates or ensuring stewardship maturity of new datasets.

There are a number of ways data users can or should use the scores from this matrix. At a simple and high level for users with minimal requirements, then the scores can be used to choose the dataset with the highest level of maturity for their specific application. Mature datasets and systems make it easy for users to assess which dataset they need. However, it is highly encouraged that users take a more in-depth approach, thinking about their application as well as the scores for each category and aspect. Datasets which have different aims and processing levels will have different maturity scores, but the appropriate dataset for a particular user’s application may be one with a lower overall score. For example, when studying sea surface temperatures, a user could take one of HadSST3 (Kennedy et al., 2011a, b) or NOAA-ERSSTv5 (Huang et al., 2017, Huang et al., 2018a, b), both highly-processed gridded (and in some cases infilled) datasets but neither have been assessed using the SMM-CD (at the time of writing). Alternatively, one could use the raw ship track and buoy information available in ICOADS R3.0 (Freeman et al., 2017), which has been assessed using the SMM-CD. However, just because ICOADS has been assessed by the SMM-CD, does not necessarily make it the right choice for this particular application. Furthermore, at a point when both HadSST3 and NOAA-ERSSTv5 have been assessed by the SMM-CD, if their scores are greater than ICOADS, this does not automatically make them the right choice either.

During the construction of the SMM-CD, the differing level of resources available for data managers between countries, institutions and groups were taken into account. The first matrix to be developed focussed initially on datasets which were global in outlook, commonly used internationally, and could be thought of as “high-profile” (SMM-CD, Section 4). Subsequently, an adapted version of the SMM-CD matrix has been constructed, mindful that for regional or national operational products there may be fewer resources available for data management, and it is also likely the data will have subtly different use cases (SMM-CD_NRP, Section 5).

The assessment process for the SMM-CD and SMM-CD_NRP is a voluntary self-evaluation that can be used to evaluate gaps in the management and stewardship of a dataset by the data author or manager. As well as the scores for each aspect, there is a field for the evidence supporting the score. The assessment forms in MS Word are publicly available on Figshare (Lief and Peng, 2019) along with a guidance booklet which will help in the completion of the evaluation (Peng et al., 2019). The assessed datasets can be published in the WMO Catalogue for Climate Data. In this case a cooperative evaluation of the assessment is done for the global dataset ratings. The initial 18 datasets in Table 1 were evaluated by the WMO Expert Team on Data Development and Stewardship (ET-DDS) and some of the data managers, using a template assessment form. Evaluation was discussed with assessment lead and if appropriate rating and comments were updated. Table 1 lists the update date of the assessments after evaluation of the initial assessments. This evaluation process adds to the completeness and quality of the assessments.

Table 1

Details of the datasets which have been assessed by the SMM-CD up to September 2020. Updated list available at WMO Climate Data Catalogue of assessed datasets https://climatedata-catalogue.wmo.int/assessed-datasets.


DOMAIN DATASET INSTITUTION TYPE DATE OF ASSESSMENT WEBPAGE

Surface temperature NOAAGlobalTemp v4.0.1 NOAA merged land–ocean surface temperature analysis 2018-10-15, updated 2019-03-12 https://www.ncdc.noaa.gov/data-access/marineocean-data/noaa-global-surface-temperature-noaaglobaltemp, http://dx.doi.org/10.1175/2007JCLI2100.1

HadCRUT.4.6.0.0 Met Office Hadley Centre gridded dataset 2019-03-08, updated 2019-03-21 http://www.metoffice.gov.uk/hadobs/hadcrut4 (v4.5.0.0 also at https://catalogue.ceda.ac.uk/uuid/22a878b3ada24590970974588642f585)

GISTEMP v3 NASA surface temperature analysis 2019-03-09, updated 2020-01-21 https://data.giss.nasa.gov/gistemp/

Precipitation GPCC Full Data Monthly DWD globally gridded monthly totals 2019-02-27, updated 2020-06-18 www.doi.org/10.5676/DWD_GPCC/FD_M_V2018_100

Crowdsourcing (Rain, hail & Snow fall) CoCoRaHS Colorado State Edu observations 2018-10-07, updated 2019-03-29 https://www.cocorahs.org/

Sea level GLOSS IOC observations 2018-10-17, updated 2019-04-17 http://www.gloss-sealevel.org/

CCl-SeaLevel ESA satellite 2018-10-26, updated 2019-04-30 https://climate.esa.int/en/projects/sea-level/

C3S-SeaLevel Copernicus Climate Change Service satellite 2018-10-26, updated 2019-04-30 and 2020-08-31 https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-sea-level-global?tab=overview

Sea Ice SeaIce Index NSIDC satellite 2018-10-24, updated 2019-04-29 https://nsidc.org/data/g02135, https://doi.org/10.7265/N5K072F8

Ice Sheets GLAS-DEM-500m NASA-JPL satellite 2018-10-24, updated 2019-03-11 https://nsidc.org/data/nsidc-0304, https://doi.org/10.5067/K2IMI0L24BRJ

GLAS-DEM-1km NASA-JPL satellite 2018-10-24, updated 2019-03-18 https://nsidc.org/data/nsidc-0422, https://doi.org/10.5067/H0FQ1KL9NEKM

Antarctica-GRACE NASA-JPL satellite 2018-10-24, updated 2019-03-10 https://podaac.jpl.nasa.gov/dataset/ANTARCTICA_MASS_TELLUS_GRACE_MASCON_CRI_TIME_SERIES_RL05_V1, https://doi.org/10.5067/TEMSC-ANTS1

Greenland-GRACE NASA-JPL satellite 2018-10-24, updated 2019-04-29 https://podaac.jpl.nasa.gov/dataset/GREENLAND_MASS_TELLUS_GRACE_MASCON_CRI_TIME_SERIES_RL05_V1, https://doi.org/10.5067/TEMSC-GRTS1

Glaciers GLIMS GLIMS satellite 2018-10-24, updated 2019-02-24 http://www.glims.org/

Climate Extremes Indices HadEX2 Met Office Hadley Centre observations and model data 2018-05-07, updated 2019-06-16 www.climdex.org, https://doi.org/10.1002/jgrd.50150

Hydrology GRDC Bundesanstalt fuer Gewaesserkunde observations 2018-10-05, updated 2019-03-19 https://www.bafg.de/GRDC/EN/01_GRDC/grdc_node.html

Marine WOD13 NOAA and IODE observations 2018-09-12, updated 2019-03-29 https://www.nodc.noaa.gov/OC5/WOD/pr_wod.html, https://www.ncei.noaa.gov/products/world-ocean-database

ICOADS NOAA simple gridded monthly summary products 2018-11-02, updated 2019-03-27 https://icoads.noaa.gov/, http://dx.doi.org/10.1002/joc.4775

3 Development Process

The development of the SMM-CD and SMM-CD_NRP stems from the outcomes of two meetings of international domain experts with the focus on information management and climate data modernisation. We give details of these meetings, and the subsequent development to demonstrate the provenance of these matrices and the high level of consultation and discussion that formed part of their construction.

The first, the “WMO Workshop on Information Management” (WMO, 2018a) held in Geneva in October 2017, was to develop WMO-wide guidance on information management as well as make progress to identifying datasets with good data management as well as enhance their accessibility and visibility on the WMO Information System (WIS). Those who attended represented a wide range of institutions, WMO Member states and covered many specialisms. The workshop recommended that the WMO Commission for Climatology (CCl) develop a catalogue of datasets based on best practices in current maturity models (e.g. for use in the monitoring of key climate indicators) and also provide non-technical users with improved access to data. Furthermore, an Inter-Programme Expert Team on the Climate Data Modernization Programme (IPET-CDMP) was tasked with managing the High-Quality Global Data Management Framework for Climate (HQ-GDMFC). It was recognised that in order to determine the maturity of climate datasets, they would need to be assessed against a maturity index in a process agreed by the WMO.

The second, the “WMO Expert Meeting on Climate Data Modernisation”, (WMO, 2018b) held at KNMI in April 2018 was to develop a WMO-wide SMM-CD based on existing US and European maturity matrix models (as outlined above). Datasets assessed through this matrix would form part of a WMO Climate Data Catalogue, with discovery and access protocols for the WIS and search engines to assist non-technical users also scoped and developed at this workshop. The subject matter specialists discussed and refined the contents of the matrix as well as which datasets were to be used for real-world testing.

Further refinements and updates to the SMM-CD were put in place over the following two years, taking into account feedback from data managers, dataset developers and subject matter experts from the WMO ET-DDS and the International Expert Group on Climate Data Modernisation (IEG-CDM). In June of 2019, the Manual on HQ-GDMFC, the SMM-CD and the WMO Climate Data Catalogue were endorsed at the 18th WMO Congress.

4 The Stewardship Maturity Matrix for Climate Data (SMM-CD)

The SMM-CD has been developed with the intention of it being used as a self-assessment tool, with some external moderation and guidance. Full details about the SMM-CD are given in the Guidance Document (Peng et al., 2019) as well as on the Assessment Template (see Resources), but we give an overview here.

There are four categories in the SMM-CD, under each of which lie two or three aspects (Figure 2). A score is determined for each aspect, corresponding to the maturity scales outlined in Figure 3. In the Guidance Document more detailed examples are given for most of the aspects at each maturity level to help with the self assessment process.

Figure 2 

Diagram of SMM-CD Categories and Aspects. Based on Fig. 2 in Peng et al. (2019).

Figure 3 

The maturity scale structure for the WMO SMM-CD. Based on Fig. 1 in Peng et al. (2019).

The maturity level starts where there are few or no procedures and processes defined or in place, or that they are not reported or (poorly) documented. This “ad-hoc” level could be, for example, an individual researcher creating and storing files locally on their own hard disc. At higher levels, increasing levels of managed and supported processes are in place across the aspects, through to an optimal level of stewardship maturity whereby many processes are demonstrably compliant with international standards.

Throughout the SMM-CD, WMO-defined requirements and standards are recommended where they are applicable. The ratings for a data product should be assessed at that level where all the descriptors in the current and lower levels are satisfied. In some cases, a fraction may be used to indicate that one or more criteria may be satisfied at a level higher than the current. All assessment ratings come with supporting information to justify the level scored. So far, 18 highly utilized global climate dataset have been assessed (Table 1). The assessment results were reviewed by the members of the WMO ET-DDS.

It should be noted that dataset maturity ratings are a snapshot of the current state which may evolve over time. The requirements or standard against which the maturity of a dataset is evaluated should be described in the assessment report prepared by the dataset point-of-contact or an evaluator.

4.1 Data Access

This category refers to the ability of users to find and then obtain the dataset with higher levels reflecting the ease to which this is possible. At the lowest levels, personal contact is required to both know about and then receive the data, whereas at the highest levels, datasets are available through international catalogues and portals with the ability to restrict the data downloaded to just the fields of interest.

  • Discoverability – how easy it is to find the data
  • Accessibility – how easy it is to obtain the data

4.2 Usability and Usage

This category refers both to how easily the data product can be used by users and also how impactful uses of the data product have been at the time of assessment. There are two aspects to the usability category; data portability and documentation; both of which are necessary for a dataset to be usable. At the higher levels, data products would be available in a number of standard formats with documentation extending to tutorials and even the complete production system being available.

The inclusion of the impact in this category is helpful to users as datasets which have attracted lots of citations or have been used in assessment reports may indicate a high level of maturity of this product in comparison to other, less widely used, products. However it will also be likely that recently released (updates to existing) datasets have low citation levels, and furthermore, that different disciplines have varying levels of citations.

  • Data Portability – ranges from inability to transfer data in computerized form to fully machine-readable and interoperable.
  • Documentation – assesses the extent and accessibility of information on how to use the dataset or product, and hence users’ ability to determine its fitness for purpose.
  • Usage & Impact – where the dataset has been used, and the relative high-profile and impactful nature of these.

4.3 Quality Management

The quality management category separates quality control (QC) and quality assurance (QA) from other aspects, such as quality assessment and data integrity. In the case of the SMM-CD, QC comprises the set of routines and checks run by the dataset creators as an integral part of the construction of the datasets. Separately, QA is the set of checks and processes in place to ensure the construction method is robust, for example code review, version control etc. In contrast, quality assessments are sets of analyses which make (quasi-) independent investigations into, e.g., dataset uncertainty source.

The data integrity aspect reflects tools available on the download pages and access portals to ensure that the data requested are the data received.

  • Quality Assurance and Control Procedure – level and accessibility of the QA and QC procedures.
  • Quality Assessment – separate assessment of dataset quality and limitations
  • Data Integrity – monitors dataflow and ingest processes to ensure data requested is data received.

4.4 Data Management

This category takes an overview look at the dataset, and assesses the procedures, protocols and policies that exist to ensure the data product has sufficient longevity to be useful. Lower scores indicate there is a greater risk that the dataset could be unusable or lost, whereas if they are higher, then this risk is less.

Unsurprisingly, the preservation aspect assesses how well the data products are archived in case of system failures, personnel changes and the like. Although some metadata facets are assessed in the Usability and Usage category, and in some cases and contexts also the Discoverability aspect, there are other types which are also important for example, the provenance of the dataset. Here metadata relating to the data collection (e.g. a dataset consisting of a number of files or quantities) down to the granule-level (the smallest manageable piece e.g. a file within a dataset, a weather station history) are important.

Finally, dataset governance structures ensure that processes such as read-write permissions for the creation and release of datasets have been adhered to, indicating a formal approach rather than more ad-hoc efforts.

  • Preservation – assesses the security of the data such as backup procedures and retention policies.
  • Metadata – covers how detailed the descriptive metadata are, online availability of this information, and extent of compliance with international standards. It is also important for users to have up to date contact information for the dataset.
  • Governance – refers to the extent to which controls, accountabilities and compliance mechanisms are put in place, and their adherence with community best-practice.

5 The Stewardship Maturity Matrix for Climate Data for National and Regional Purposes (SMM-CD_NRP)

The rationale for developing a version of the SMM-CD for national and regional datasets is to address the operational focus of data management at the NMHSs. Their primary mission is to make national and regional datasets available to users. The SMM-CD_NRP therefore retains two main categories: Operational Data Management and Data Stewardship. These best inform the NMHSs on how to manage their data according to best management practices and standards at the national and regional levels. As with the SMM-CD, the goal is to provide NMHSs with a user-friendly self-assessment tool to help determine gaps in their data management and stewardship, and provide a structure to move towards improved practices and standards to attain a satisfactory level of competence in data management and stewardship.

The SMM-CD_NRP was heavily influenced by the structure of the SMM-CD, but with some simplification and merging of the categories, aspects and maturity levels to ensure they are more appropriate for national and regional purposes. In this section we outline differences of the SMM-CD_NRP to the SMM-CD. Full details on the SMM-CD_NRP are given in its Guidance Booklet and the Assessment Template (see Resources).

In Figure 4 below, the structure of the maturity scale of the SMM-CD_NRP is presented. In contrast to the SMM-CD, there are only three main levels, with an optional fourth “highly desirable”. Although the progression of increasing stewardship maturity remains, the target user of the SMM-CD_NRP entails a reduction in the level of stewardship maturity required for each level.

Figure 4 

The maturity scale structure for the WMO SMM-CD_NRP.

The two categories of the SMM-CD_NRP both have a number of aspects, as outlined in Figure 5. In the Operational Data Management category there are five assessment aspects: Data Access, Data Portability, Data Preservation, Documentation and Data Integrity; and in the Data Stewardship category there are three aspects: Quality and Usage, Governance, and Metadata. Some of the aspects are very similar to those in the SMM-CD, whereas others are blends of two or more. Below we only go into detail for the few aspects which are different or unique in the SMM-CD_NRP.

Figure 5 

Diagram of SMM-CD_NRP Categories and Aspects.

5.1 Operational Data Management

This category addresses the assessment of operations that are required to enable access, portability, archival, documentation, and ensuring data integrity in the NMHS data holdings. Most aspects under this category are very similar to those in the SMM-CD. The aspect that has the greatest difference is Data access, which assesses the extent to which a dataset can be found and accessed. A single aspect is used in the SMM-CD_NRP, blending the two aspects (Discoverability and Accessibility) under the Data Access category in the SMM-CD

5.2 Data Stewardship

The second category of the SMM-CD_NRP provides a rating on how well a dataset is stewarded, by assessing the quality and usage, governance, and metadata aspects. These last two again are very similar to those in the SMM-CD. However, here the Quality Assurance and Control, Quality Assessment as well as Usage aspects have been blended together. This Quality and Usage aspect assesses the degree to which robust quality control is carried out on the data, along with quality flagging or error estimates, and the extent to which scientific peers trust the data in conducting research and compiling reports.

6 Applying the SMM-CD/_NRP – Case Studies

We show in Tables 1 and 3 the lists of datasets that have been assessed so far (30th September 2020) under the SMM-CD and SMM-CD_NRP respectively. A complete and up-to-date list is kept on the WMO Climate Data Catalogue (https://climatedata-catalogue.wmo.int/assessed-datasets). It is the aspiration and intention that over time more datasets from a greater range of domains and platforms which are of relevance to the community that WMO serves will be assessed. This is within the remit of the newly formed WMO Services Commission (SERCOM) Expert Team on Data Requirements for Climate (ET-DRC) Services. The assessment procedure is a living process and so as datasets are updated, the ratings are likely to change. We now outline in more detail a number of case-studies showing how the matrices can be applied in practice.

6.1 Copernicus Climate Change Service (C3S) Sea Level Dataset (SMM-CD V4)

The European Union’s Copernicus programme aims at providing environmental observations of the Earth system for the ultimate benefit of all European citizens. The mission of the Copernicus Climate Change Service (C3S) is to provide consistent and authoritative information about climate change. The sea level Essential Climate Variable (ECV) product is of interest since sea level rise is one of the major consequences of climate change. Hence, it is essential to monitor the sea level changes observed on a global and regional scale. In this context, a sea level dataset based on satellite altimetry is available to users through the C3S Climate Data Store (CDS, https://cds.climate.copernicus.eu/). This daily, multi-mission merged, gridded dataset of sea level anomalies has been designed to ensure stability and homogeneity of the time series. It starts in 1993 and will be extended three times per year.

The data provider and several ET-DDS members have used the SMM-CD to assess the data management of this Climate Data Record (summarised in Table 2). In terms of ‘Data Access’, the grade is 5 for the ‘Discoverability’ aspect since the dataset is searchable and easily available through the online institutional C3S catalogue and 4.5 for ‘Accessibility’ due to the CDS interface and the associated toolbox (no spatial sub setting is possible when downloading the data and all variables have to be downloaded together). Regarding the ‘Usability and Usage’, the attributed grade for ‘Data Portability’ is 4.5 since the data are distributed as NetCDF (Network Common Data Form; Unidata, 2020) files, compliant with the Climate and Forecast (CF) Metadata Conventions (https://cfconventions.org/), but no other format is available. A grade of 5 is given for ‘Documentation’ and ‘Usage’ aspects since the dataset is fully documented and has been referenced in international climate assessment and published reports. In terms of ‘Quality Management’, the quality assurance procedure is fully documented with an additional independent evaluation and quality control (EQC, C3S 2019) performed by the Copernicus service. Target requirements and a detailed gap analysis are available, and details of the error budget have been published in peer-reviewed journals, leading to a grade of 5 for both ‘Quality Assurance and Assessment’ aspects. The same grade of 5 is given for ‘Data Integrity’, which is systematically verified with a standard approach to ensure that the distributed data are the same as the initial data files. Finally, regarding the ‘Data Management’ category, a grade of 4 is attributed for ‘Data Preservation’ since the data are distributed on an institutionally maintained platform and are archived following a defined and implemented procedure which agrees with community standards. A grade of 5 is given for both ‘Metadata’ and ‘Governance’ aspects since the dataset is distributed with comprehensive metadata, detailed documentation and versioning system, and governance aspects are well-defined within the E.U. Copernicus programme and are compliant with international standards.

Table 2

The scores and evidence for each aspect from the Copernicus Climate Change Service (C3S) Sea level dataset assessed by the SMM-CD. These have been taken from the assessment document available through the WMO Data Catalogue.


CATEGORY ASPECT SCORE ACHIEVED SCORE EVIDENCE

Data Access Discoverability 5 Dataset is discoverable in the C3S online searchable Climate Data Store (CDS, https://cds.climate.copernicus.eu/) including overview and metadata description. Operational production is maintained, and temporal extensions are routinely provided. Procedures for data integration in the catalogue are defined and applied.

Accessibility 4.5 Data is available through the institutional C3S CDS web interface with the possibility to select the period of interest.
However, no spatial sub-setting is possible. All variables available must be downloaded together and data are made available on an ftp-based pull-mode access.
Visualization is possible through the CDS toolbox (https://cds.climate.copernicus.eu/cdsapp#!/toolbox)

Usability & Usage Data Portability 4.5 Data format is NetCDF-4 and follow Climate-Forecast (CF) conventions. The CDS toolbox is available and allows further processing and customization of the data by the users

Documentation 5 Documentation based on a standard C3S template is available online with a unique ID and version number.
The production system is fully described in the documentation. Altimetry tutorials are available online (http://www.altimetry.info/) and use cases produced with the C3S toolbox are in the process of being published.

Usage and Impact 5 Sea level rise is a direct consequence of climate change and thus, the altimeter sea level time series is cited in numerous peer-reviewed publications (CMEMS OSR#4: https://marine.copernicus.eu/wp-content/uploads/2020/06/OSR4_Summary_WEB_SinglePages.pdf), in institutional reports (C3S European State of the Climate Report: https://climate.copernicus.eu/ESOTC), international climate assessment reports (ESA SL_CCI http://www.esa-sealevel-cci.org/webfm_send/584, IPCC SROCC 2019: www.ipcc.ch/srocc/) and is also used in policy-making process (WMO State of the Global Climate report: https://library.wmo.int/doc_num.php?explnum_id=10211).

Quality Management Quality Assurance and Control Procedure 5 QA/QC procedures are fully documented and applied to the full historical record and to the regular temporal extensions. Estimated accuracy numbers are available, derived from published studies of error characterization. The C3S EQC component aims at informing the users about the fitness for purpose of the datasets with an independent approach.
Target requirements and gap analysis are available, and a dedicated user service desk considers user feedback. (https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-sea-level-global?tab=doc).

Quality Assessment 5 Product quality assurance procedure and assessment report are available in the data documentation.
The dataset is produced and distributed within the European C3S.
Detailed error budget has been produced, leading to uncertainty characterization and results have been published in peer-reviewed journal (Ablain et al., 2015; Taburet et al., 2019; Ablain et al., 2019; Prandi et al., 2021).

Data Integrity 5 The copy of data files from the production server to the diffusion platform is made using the “rsync” Unix command which includes a ‘checksum’ verification step. Data integrity is thus systematically verified with a standard approach to ensure that data received, archived and disseminated are conform to the initial data files.

Data Management Preservation 4 The C3S sea level data are distributed on an institutionally maintained platform. The architecture of the equipment required for the production, diffusion and backup systems is defined and described in the public technical documentation.
The diffusion server consists in a main server and a redundant one (hosted in a backup separated data centre). The data are systematically stored, saved and archived using secured internal repository, following defined and implemented procedure which is conform to community standards.

Metadata 5 The metadata available in each data file are compliant with international standards and support dataset provenance. Metadata is updated following each evolution of the input data.
The input data are described in the Product User Guide and Specification (http://datastore.copernicus-climate.eu/documents/satellite-sea-level/D3.SL.1-v1.2_PUGS_of_v1DT2018_SeaLevel_products_v2.4.pdf) so that data product can be linked to the version of the data from which it was derived.

Governance 5 The responsibility of the data production is clearly defined within the E.U. C3S. Point of contact is clearly defined.
The entity in charge of the management of the data production and delivery service is audited annually.

Table 3

Details of the datasets which have been assessed by the SMM-CD_NRP up to September 2020.


DOMAIN DATASET INSTITUTION TYPE DATE OF ASSESSMENT WEBPAGE NOTES

Brazil Temperature, Precipitation and Humidity INMET Automatic and Manual Weather Stations 24.09.2020 https://bdmep.inmet.gov.br The annual number of stations varies according to the year of implementation of automatic stations and removal of conventional stations

Canada Daily maximum and minimum temperatures, monthly mean temperature Environment and Climate Change Canada Homogenized data time series using observations from manual and automatic stations 11.08.2020 Daily data at http://crd-data-donnees-rdc.ec.gc.ca/CDAS/products/EC_data/AHCCD_daily/ and monthly data at ftp://ccrp.tor.ec.gc.ca/pub/AHCCD/

Germany QuWind100 DWD model data 20.08.2020 https://www.dwd.de/DE/leistungen/quwind100/qu-wind_100.html Download at https://opendata.dwd.de/climate_environment/CDC/grids_germany/multi_annual/wind_parameters/Project_QuWind100/

Germany Radar-based Precipitation Climatology for Germany DWD remote sensing data 20.08.2020 https://opendata.dwd.de/climate_environment/GPCC/radarklimatologie/ Download at https://www.doi.org/10.5676/DWD/RADKLIM_RW_V2017.002

France DRIAS 2020 Climate simulations corrected over Metropolitan France Méteo-France Climate simulation 04.09.2020 http://www.drias-climat.fr/

6.2 National Precipitation and Wind Speed Dataset for Germany (SMM-CD_NRP)

Deutscher Wetterdienst (DWD) is the national meteorological service of Germany. To fulfil its legal tasks, it operates among other things an in-situ station network as well as a radar network for precipitation monitoring, which is currently based on 17 devices. DWD operates also a suite of numerical weather prediction (NWP) models.

6.2.1 Radar-based Precipitation Climatology for Germany

The DWD radar data are stored, which allows a climatological reprocessing of this very high-resolution observational data set (up to 1 km spatial and 5 minutes temporal resolution). A summary of the production steps for this dataset is as follows. The radar reflectivity was automatic quality controlled to remove/correct, for example, clutters and spikes. The next step was the adjustment of the radar data to the quality-controlled rain gauges, where this procedure includes a verification step and improvement of the adjustment by a subset of the gauges, which were not part of the original adjustment (Winterrath et al., 2017). The data set was developed within the research project RADKLIM (Winterrath et al., 2017) and has now transferred to an operational data product. The driver is the assumption that the majority of heavy precipitation events were missed by gauges, now confirmed by Lengfeld et al., (2020). This data set starts in 2001 and is extended by the observations of the previous year on an annual basis.

The Radar-based Precipitation Climatology for Germany was assessed by a member of the IEG-CDM in consultation with the dataset creators. The overall rating for the ‘Operational Data Management’ category is level 2.4, and level 2.3 in ‘Data Stewardship’. Regarding the individual aspects, ‘Data Access’ has level 2 as it is distributed via the national climate data portal via sftp/https, downloadable in monthly chunks. As it is available as binary and ASCII file with GIS-header and manual, ‘Data Portability’ has level 3, with ‘Data Preservation’ at the same level, as the backup procedure follows institutional archiving practices. The ‘Documentation’ aspect also reached level 3, as the dataset was created within a collaborative research project and interim and final reports as well as papers describe the creation of the dataset. ‘Data Integrity’ has level 1, as no checks about the integrity of files were applied during the copy process. The aspect ‘Quality and Usage’ gain level 3 as the data were quality controlled by means of a defined and reviewed QC-procedure. The aspects ‘Governance’ and ‘Metadata’ achieve both level 2 as the dataset follows standard procedures along with accountabilities and user contact information is provided as well as the coordinates and projection of the grid.

6.2.2 QuWind100

Germany is striving to increase its amount of renewable energy capacity, especially the ratio of photovoltaics and wind energy (DEU, 2019). Amongst other datasets developed for this sector, DWD developed QuWind100, a dataset for mean wind speed at 100 and 200 metres above ground (the typical height of wind turbine hubs). With this dataset, suitable locations for wind turbines can be defined as well as expected yields estimated. A reference period of 1981 to 2010 was chosen and projections over 2021 to 2050 estimated based on the RCP8.5 emissions scenario. An innovative model chain was developed to close the gap between the available in-situ and needed data by combination of the mesoscale model COSMO-CLM and the boundary layer model HIRVAC2D (Starke et al, 2019). The data set was developed in a collaborative research project of DWD and the Technische Universität Dresden.

QuWind100 was also assessed by a member of the IEG-CDM in consultation with the dataset creators. The rating for the ‘Operational Data Management’ category is level 3 and 2.3 in the ‘Data Stewardship’ category. Within the aspects, ‘Data Access’ and ‘’Data Portability’ gain level 4, as the data are available via the German Climate Data Centre in the OpenData portal and also via web services as NetCDF and comma-separated value (CSV)-files for the whole country as well as user-defined subregions, in seasonal and annual chunks. Level 3 is reached by the aspects ‘Data Preservation’ and ‘Documentation’, as a back-up copy is held following institutional practices and a final report of the research project including a verification is available. As no systematic data integrity checks were done, the aspects ‘Data Integrity’ achieve level 1. The ‘Quality and Usage’ aspect reaches level 3 as input data from stations operated by DWD run through a defined, documented and audited QC routine. The two aspects ‘Governance’ and ‘Metadata’ gain level 2 as basic information like the latitude and longitude of the grid cells as well as the elevation above ground are provided, accountabilities are defined and user contact information is given.

6.3 Third Generation of Homogenized Temperature Datasets (Monthly and Daily) for Canada (SMM-CD_NRP)

The Climate Research Division of Environment and Climate Change Canada has produced the third generation of homogenized (surface air) temperature datasets (AHCCD_SAT_G3, Vincent et al, 2020). These include homogenized versions of both daily minimum and maximum temperatures, as well as the derived monthly means, for 780 locations across Canada, the majority of which are currently active stations of long data records (Vincent et al. 2020). This is a research data product suitable for climate trend analysis. The production procedure includes quality control, adjusting daily minimum temperatures to diminish the effects of the change in observing time in 1961, joining of data records from nearby sites to form a long data time series, homogeneity testing of each time series of annual and seasonal mean temperatures, and homogenization of each daily and monthly data series by adjusting the raw data series to diminish all the non-climatic shifts identified for that raw data series (Vincent et al. 2020).

The data stewardship assessment with the SMM-CD_NRP was made by the data producer team, one of whom is an ET-DDS member. The averaged rating level is 2.4 for Operational Data Management, and 2.3+ for Data Stewardship, with 3.0 being the maximum rate of the mandatory maturity criteria. Specifically, the Data Access, Portability, and Integrity aspects received Level 2 rating, because (1) the datasets are available via ftp as a whole without enhanced online data services (e.g., there is no option to download a subset of the dataset); (2) the data and metadata are separate in ASCII format, not directly usable in a geospatial environment such as ArcGIS; and (3) the data were produced with random data integrity checks. The Data Preservation and Documentation aspects received Level 3 rating, because (1) both the raw and homogenized data, as well as the computer programs to produce the datasets, are routinely backed up on several servers; and (2) documentation on the methods to produce the datasets (including the related published journal papers) and on data format and filename convention is available online. The Quality and Usage aspect of Data Stewardship received level 3+ rating, because these were produced with comprehensive QA/QC and homogenization procedures, and have been well cited in peer-reviewed journals and used by other well-known climate data centers. The Governance and Metadata aspects received Level 2 rating, because (1) standard procedure and approval process was followed, along with accountabilities and compliance mechanism for ensuring that the data are secure, accessible, and useable; and (2) limited collection-level metadata are available for the users but are not conforming to community standards.

6.4 Fundamental Temperature, Precipitation and Humidity Over Brazil (SMM-CD_NRP)

The National Meteorological Service of Brazil, hereafter INMET, has provided the Brazilian Climate data (Brazilian-CD) online and freely available as text CSV to the public. The Brazilian-CD comprises data from all observing weather stations operated and maintained by INMET, either automatic or conventional stations, with the total number varying according to year, from about 400 in 2000 to 834 stations in 2020. The WMO SMM-CD_NRP was used to assess the temperature, humidity and precipitation data of the Brazilian-CD. The averaged stewardship maturity rating levels for these categories are 2.5 and 1.7 respectively. The assessment was originally made by an expert from INMET, which was moderated by one member of the WMO ET-DDS team.

The online publication of the Brazilian CD meets important criteria for publicity, such as data access, portability and preservation, in the operational data management category, but also in this category there is room for improving the documentation and data integrity, due to lack of information, or even application, of such aspects in the web site. Regarding the data stewardship category, it was noticed that no quality assurance or quality control procedures are informed in the online documentation of this CD, but a further communication to the assessment point-of-contact (POC) had clarified that the dataset is under a routine procedure of quality control. Furthermore, despite the CD being provided by INMET, no explicit information on governance or POC is given, which lowers the score for this aspect. The Brazilian CD is available online with minimal metadata information, regarding the observing station location, altitude and period of operation. Besides, climate normal parameters can be found as figures and table, by simply consulting the map of the station, though such information is not integrated in the provided CSV downloaded file. Nonetheless, substantial progress has been made recently by INMET for improving the informational content of the Brazilian Climate data to users, as nationally reported and also referred by relevant centres as NOAA CPC and IRI.

6.5 Drias 2020 – National Climate Simulations Over France (SMM-CD_NRP)

The new DRIAS 2020 dataset provides high resolution bias-adjusted climate projections over France in a variety of graphical or numerical forms.

Through the EUROCORDEX initiative (Jacob et al, 2014), regional climate models were implemented on a limited area domain covering Europe at horizontal resolution of 12 km. The regional climate simulations were dynamically downscaled from global climate projections of the Coupled Model Intercomparison Project – Phase 5 (CMIP5, 2011). DRIAS service has identified a consistent subset from this EUROCORDEX climate projections ensemble, which is more manageable for decision-making and impact studies, while preserving the range and characteristics of regional responses in metropolitan France. All simulations were produced for the RCP4.5 and RCP8.5 emission scenarios and eight simulations for the RCP 2.6. They are available over the period 1971-2100 (1971-2005 being the historical reference part of the simulations and 2006-2100 the climate projection part). Then, the ADAMONT (Verfaillie et al, 2017) bias adjustment method has been applied over France to EUROCORDEX climate model outputs using the SAFRAN (Quintana-Segui et al, 2008) reanalysis data as reference. The resulting simulations are available at a daily time step on a 8 × 8 km horizontal grid and form the new DRIAS 2020 dataset.

Aspects in the Operational Data Management category generally present a high level of maturity: the Data Access, Portability, Preservation and Documentation aspects received Level 3 rating. Data can be accessed online after a quick registration procedure, a catalogue of data and products is available, and the user can select and download a subset of the dataset, in ASCII or NetCDF formats. For the preservation aspect, there is a multi-site backup system; the Météo-France archive and CNRM ESGF archive system. A great effort has been put into documentation, with devoted “Education” and “Discover” sections. Data Integrity is an aspect that can be improved, as there is no online documentation of data integrity checks.

Aspects on data stewardship are evaluated between levels 2 and 3. Quality and usage are at 2.5, data quality is assessed but no specific documentation on this aspect is available. However, the national report “The climate of France in the 21st century” is based on the DRIAS dataset and products demonstrating its high profile. As for governance, DRIAS is a multi-partnership endeavor between Direction of Climatology at Météo-France (coordination and services implementation) and the main French organizations involved in climate modeling: IPSL, CERFACS, CNRM-GAME, which, in addition to climate simulations, bring their scientific expertise on how to use tools and interpret results. As user support is also provided, Metadata aspects is level 2 for ASCII data but Level 3 for NetCDF data following CMOR standard.

7 Summary

We have described the development of a structured and standards-based method of informing both data users and data managers of climate datasets about recommended stewardship practices, as well as providing an assessment process. The Stewardship Maturity Matrix for Climate Data (SMM-CD), which has now been approved by the WMO Congress, is a self-assessment tool to score the maturity of individual data products that are global in scope. There are four categories (data access, usability and usage, quality management, and data management) each with a number of aspects, which are assessed to be at one of five levels. Datasets which have been assessed by the SMM-CD are collected in the WMO Climate Data Catalogue, which will grow over time.

Furthermore, a matrix for national and regional purposes (SMM-CD_NRP) has been derived from the SMM-CD, by blending the categories and aspects resulting in a smaller set (operational data management and data stewardship). The SMM-CD_NRP is for data products which are produced more operationally and have a smaller geographic range. These are more likely to be produced by NHMSs and have a lower processing level and hence be closer to the basic observations and climate data record. We also present a number of case studies where these matrices have been applied.

The SMM-CD_NRP has already shown to be of great value in assessing the way that climate datasets are provided. Although standardization is not the goal of its implementation, the results from the case studies presented herein have shown differences in the averaged scores in both categories, differences being slightly more evident in data stewardship practices. Contrasting the case studies, we noticed that evidence of important aspects, such as quality assurance, quality control and governance, is not easily available for some datasets. In addition, not all NetCDF datasets follow the CF compliance requirements for metadata. Nonetheless, we notice also that, at this first implementation stage of the matrix, the climate datasets being evaluated have good data stewardship practices which will give confidence to their users, as they scored very close to the highest mandatory level in most aspects. This is already an important achievement of the SMM-CD_NRP, by which providers can use the information to improve their data services.

The WMO manual on HQ-GDMFC will incorporate this matrix and be distributed to all Member states and their NHMSs. In addition, this manuscript, the matrix and its guidance notes will be a reference for data managers and others managing and producing climate data, for the application of the SMM–CDs in monitoring climate change and climate services. As any other set of standards, this manual is subject to updates and amendments as needs arise and the requirements evolve. The WMO Standing Committee on Climate Services has this in its mandate.

Going forward, we envisage that assessing against these maturity matrices becomes a regular part of dataset development and release, both in climate as well as forecasting services. The new WMO SERCOM ET-DRC Services will continue the work of the ET-DDS and IEG-CDM, by adding more datasets at the global, regional and national level to the WMO Climate Data Catalogue, and updating those already included where necessary. This catalogue is to become part of the new WMO Climate Data Portal in the near future.

Acknowledgements

The authors acknowledge the National Meteorological Service of Brazil, INMET, for kindly providing the assessment of its Climate Dataset as available to the public under its web page, which was used as a result of this work.

We thank the two anonymous reviewers whose detailed comments helped improve this manuscript.

We thank Jay Lawrimore, Huai-min Zhang, Boyin Huang (NOAA/NCEI), Gavin Schmidt, Reto Ruedy (NASA/GISS), Hans Hersbach, Paul Berrisford (ECMWF), Nick Rayner, John Kennedy & Colin Morice (UKMOHC) for assisting with the assessments of temperature and reanalysis datasets and also for interesting discussions.

We thank David Gallaher, Amy Steiker, Bruce Raup & Florence Fetterer (NSIDC) for assisting with the assessments of the cryospheric datasets, Markus Ziese for the assessment of the GPCC dataset and the QuWind100 and Radar-based Precipitation Climatology for Germany National datasets, Henry Reges and his team for the assessment of the CoCoRaHS dataset. We thank Anny Cazenave (LEGOS) and Jean-François Legeais (CLS) for the assessment of the global seal level datasets, Brigitte Dubuisson (Meteo France) for the assessment of the DRIAS 2020 Climate simulations corrected over Metropolitan France dataset and Xioalan Wang (Environment and Climate Change Canada) for the assessment of the Daily maximum and minimum temperatures, monthly mean temperature National dataset. We thank Peter Siegmund (KNMI) for the assessment of the Climate Extreme Index dataset, Ulrich Looser (GRDC) for the GRDC dataset assessment and Tim Boyer (NOAA WOD) and Eric Freeman (NOAA ICOADS) for the Marine datasets assessments.

Funding Information

Robert Dunn was supported by the Met Office Hadley Centre Climate Programme funded by BEIS and Defra. Ge Peng was supported by NOAA National Centers for Environmental Information (NCEI) through the Cooperative Institute for Satellite Earth System Studies (CISESS) under Cooperative Agreement NA19NES4320002. Markus Donat was supported by the Spanish Ministry for the Economy, Industry and Competitiveness, grant reference RYC-2017-22964. KNMI hosted the EIG-CDM workshop and the development and maintenance of the WMO Catalogue for Climate Data.

Competing Interests

The authors have no competing interests to declare.

RESOURCES

WMO Climate Data Catalogue of assessed datasets – https://climatedata-catalogue.wmo.int/assessed-datasets.

Final Report of WMO International Workshop on Information Management (WWIM), 2–4 October 2017, Geneva, CH, https://wis.wmo.int/file=3799 (WMO, 2018a, accessed 19 August 2020).

Final Report of WMO Expert Meeting on Climate Data Modernisation, 16–18 April, 2018, De Bilt, NL, https://www.wmo.int/pages/prog/wcp/ccl/opace/opace1/meetings/documents/DraftMeetingReport.pdf (WMO, 2018b, accessed 19 August 2020).

SMM-CD Guidance Booklet v7 – https://figshare.com/articles/The_manual_for_the_WMO-Wide_Stewardship_Maturity_Matrix_for_Climate_Data/7002482 (accessed 19-8-2020) (Peng et al, 2019).

SMM-CD Assessment template v4 – https://figshare.com/articles/The_template_for_the_WMO-Wide_Stewardship_Maturity_Matrix_for_Climate_Data/7003709 (accessed 19-8-2020) (Lief and Peng, 2019).

SMM-CD_NRP Guidance Booklet v1 https://figshare.com/articles/online_resource/WMO_SMM-CD_NRP_Guidance_Booklet/13004606.

SMM-CD_NRP Assessment template v1 https://figshare.com/articles/online_resource/WMO_SMM-CD_NRP_v01r00_20200805_Assessment_Template_docx/13004018.

Expert Team on Data Development and Stewardship (ET-DDS):

Ali Addenjal (Libyan National Meteorology Center), Christina Lief (ret.NOAA), David Sinclair (BOM), Brigitte Dubuisson (MeteoFrance), Fatima Hdidou (Morocco Meteorological Service), Ge Peng (NOAA), José Guijarro (AEMET), Markus Donat (BSC), Reinaldo Silveira (SIMEPAR), Xiaolan L. Wang (ECCC), Lipeng Jiang (CMA).

International Expert Group on Climate Data Modernisation (IEG-CDM):

Valentin Aich (GCOS), Omar Baddour (WMO), Cedric Bergeron (ECMWF), Dominique Berod (WMO), Thorsten Busselberg (DWD), Anny Cazenave (LEGOS), Robert Dunn (UKMO), David Gallaher (ret. NSIDC), Lydia Gates (DWD), Jean-François Legeais (Collecte Localisation Satellites), Christina Lief (ret. NOAA), Anna Milan (NOAA, now WMO), Ge Peng (NOAA), Kate Roberts (BOM), Peter Siegmund (KNMI), William Wright (ret. BOM), Markus Ziese (DWD).

References

  1. Ablain, M, Cazenave, A, Larnicol, G, Balmaseda, M, Cipollini, P, Faugère, Y, Fernandes, MJ, Henry, O, Johannessen, JA, Knudsen, P and Andersen, O. 2015. Improved sea level record over the satellite altimetry era (1993–2010) from the Climate Change Initiative project. Ocean Science, 11(1): 67–82. DOI: https://doi.org/10.5194/os-11-67-2015 

  2. Ablain, M, Meyssignac, B, Zawadzki, L, Jugier, R, Ribes, A, Spada, G, Benveniste, J, Cazenave, A and Picot, N. 2019. Uncertainty in satellite estimates of global mean sea-level changes, trend and acceleration. Earth System Science Data, 11(3): 1189–1202. DOI: https://doi.org/10.5194/essd-11-1189-2019 

  3. Bahim, C, Casorrán-Amilburu, C, Dekkers, M, Herczog, E, Loozen, N, Repanas, K, Russell, K and Stall, S. 2020. The FAIR Data Maturity Model: An Approach to Harmonise FAIR Assessments. Data Science Journal, 19(1): 41. DOI: https://doi.org/10.5334/dsj-2020-041 

  4. CMIP5. 2011. International CLIVAR Project Office (ICPO) 2011. CLIVAR Exchanges – Special Issue: WCRP Coupled Model Intercomparison Project – Phase 5 – CMIP5 (CLIVAR Exchanges, No. 56 (Vol 16(2)) Southampton, GB. International CLIVAR Project Office 51pp. https://www.clivar.org/sites/default/files/documents/Exchanges56.pdf 

  5. C3S. 2019. Evaluation & Quality Control of the C3S, EQS team, C3S General Assembly, October 2019. https://climate.copernicus.eu/sites/default/files/2019-11/06%20Obregon_EQC_GA.pdf (accessed 11 September 2020). 

  6. DEU. 2019. Klimaschutzprogramm 2030 der Bundesregierung zur Umsetzung des Klimaschutzplans 2050. Available online at: https://www.bundesregierung.de/resource/blob/975226/1679914/e01d6bd855f09bf05cf7498e06d0a3ff/2019-10-09-klima-massnahmen-data.pdf?download=1 (accessed 8-9-2020). 

  7. EUMETSAT. 2013. CORE-CLIMAX Climate Data Record Assessment Instruction Manual. Version 2, 25 November 2013. Available online at: https://www.eumetsat.int/website/home/Data/ClimateService/index.html. 

  8. Freeman, E, Woodruff, SD, Worley, SJ, Lubker, SJ, Kent, EC, Angel, WE, Berry, DI, Brohan, P, Eastman, R, Gates, L and Gloeden, W. 2017. ICOADS Release 3.0: a major update to the historical marine climate record. International Journal of Climatology, 37(5): 2211–2232. DOI: https://doi.org/10.1002/joc.4775 

  9. Jacob, D, et al., 2014. EURO-CORDEX: new high-resolution climate change projections for European impact research. Regional Environmental Change, 14(2): 563–578. DOI: https://doi.org/10.1007/s10113-013-0499-2 

  10. Huang, B, Thorne, PW, et al. 2017. Extended Reconstructed Sea Surface Temperature version 5 (ERSSTv5), Upgrades, validations, and intercomparisons. J. Climate. DOI: https://doi.org/10.1175/JCLI-D-16-0836.1 

  11. Huang, B, Liu, C, Ren, G, Zhang, H-M and Zhang, L. 2018: The role of buoy and Argo observations in two SST analyses in the global and tropical Pacific oceans. J. Climate, 32: 2517–2535. DOI: https://doi.org/10.1175/JCLI-D-18-0368.1 

  12. Huang, B, Angel, W, Boyer, T, Cheng, L, Chepurin, G, Freeman, E, Liu, C and Zhang, H-M. 2018. Evaluating SST analyses with independent ocean profile observations. J. Climate, 31: 5015–5030. DOI: https://doi.org/10.1175/JCLI-D-17-0824.1 

  13. Kennedy, JJ, Rayner, NA, Smith, RO, Saunby, M and Parker, DE. (2011a). Reassessing biases and other uncertainties in sea-surface temperature observations since 1850 part 1: measurement and sampling errors. J. Geophys. Res., 116: D14103. DOI: https://doi.org/10.1029/2010JD015218 

  14. Kennedy, JJ, Rayner, NA, Smith, RO, Saunby, M and Parker, DE. (2011b). Reassessing biases and other uncertainties in sea-surface temperature observations since 1850 part 2: biases and homogenisation. J. Geophys. Res., 116: D14104. DOI: https://doi.org/10.1029/2010JD015220 

  15. Lengfeld, K, Kirstetter, P-E, Fowler, HJ, Yu, J, Becker, A, Flamig, Z and Gourley, J. 2020. Use of radar data for characterizing extreme precipitation at fine scales and short durations Environmental Research Letters. IOP Publishing, 15: 085003. DOI: https://doi.org/10.1088/1748-9326/ab98b4 

  16. Lief, C and Peng, G. 2019. The WMO Stewardship Maturity Matrix for Climate Data (SMM-CD) Template. Document ID: WMO-SMM-CD-0003. Updated 2020. Version: v04r01 20200615. Figshare. DOI: https://doi.org/10.6084/m9.figshare.7003709 

  17. Lief, C, Wright, W, Peng, G, Baddour, O, Siegmund, P, Berod, D, et al. 2020. The High Quality Global Data Management Framework for Climate Improving the Quality of Data Management for Better Climate Monitoring. ESIP. Presentation. DOI: https://doi.org/10.6084/m9.figshare.12712001.v1 

  18. Lin, D, Crabtree, J, Dillo, I, Downs, RR, Edmunds, R, Giaretta, D, De Giusti, M, L’Hours, H, Hugo, W, Jenkyns, R and Khodiyar, V. 2020. The TRUST Principles for digital repositories. Scientific Data, 7(1): 1–5. DOI: https://doi.org/10.1038/s41597-020-0486-7 

  19. Peng, G, Downs, R, Lacagnina, C, Ramapriyan, R, Ivanova, I and others. 2020. Call to Action for Global Access to and Harmonization of Quality Information of Individual Earth Science Datasets. Data Science Journal, under review. The preprint is available at: DOI: https://doi.org/10.31219/osf.io/nwe5p 

  20. Peng, G, Gross, WS and Edmunds, R. 2021. Crosswalks Among Stewardship Maturity Models Promoting Trustworthy FAIR Data and Repositories. In prep. 

  21. Peng, G, Privette, JL, Kearns, EJ, Ritchey, NA and Ansari, A. 2015. A unified framework for measuring stewardship practices applied to digital environmental datasets. Data Science Journal, 13. DOI: https://doi.org/10.2481/dsj.14-049 

  22. Peng, G, Wright, W, Baddour, O, Lief, C and the SMM-CD Work Group. 2019. The guidance booklet on the WMO-Wide Stewardship Maturity Matrix for Climate Data. Figshare. DOI: https://doi.org/10.6084/m9.figshare.7002482 

  23. Prandi, P, Meyssignac, B, Ablain, M, et al. 2021. Local sea level trends, accelerations and uncertainties over 1993–2019. Sci Data, 8, 1. DOI: https://doi.org/10.1038/s41597-020-00786-7 

  24. Quintana-Seguí, P, Le Moigne, P, Durand, Y, Martin, E, Habets, F, Baillon, M, Canellas, C, Franchisteguy, L and Morel, S. 2008. Analysis of Near-Surface Atmospheric Variables: Validation of the SAFRAN Analysis over France. J. Appl. Meteor. Climatol., 47: 92–107. DOI: https://doi.org/10.1175/2007JAMC1636.1 

  25. Starke, M, Leiding, T, Stahn, P, Gassdorf, Th, Haller, M, Walter, A, Bernhofer, Ch and Ziemann, A. 2019. https://opendata.dwd.de/climate_environment/CDC/grids_germany/multi_annual/wind_parameters/Project_QuWind100/Abschlussbericht_QuWind100.pdf. 

  26. Taburet, G, Sanchez-Roman, A, Ballarotta, M, Pujol, M-I, Legeais, J-F, Fournier, F, Faugere, Y and Dibarboure, G. DUACS DT2018: 25 years of reprocessed sea level altimetry products. Ocean Sci., 15: 1207–1224. 2019. DOI: https://doi.org/10.5194/os-15-1207-2019 

  27. Unidata. 2020. NetCDF software. DOI: https://doi.org/10.5065/D6H70CW6 

  28. Verfaillie, D, Déqué, M, Morin, S and Lafaysse, M. 2017. The method ADAMONT v1.0 for statistical adjustment of climate projections applicable to energy balance land surface models. Geoscientific Model Development, 10. DOI: https://doi.org/10.5194/gmd-10-4257-2017 

  29. Vincent, LA, Hartwell, MM and Wang, XL. 2020. A Third Generation of Homogenized Temperature for Trend Analysis and Monitoring Changes in Canada’s Climate. Atmosphere-Ocean. DOI: https://doi.org/10.1080/07055900.2020.1765728 

  30. Wilkinson, M, Dumontier, M, Aalbersberg, I, et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data, 3: 160018. DOI: https://doi.org/10.1038/sdata.2016.18 

  31. Winterrath, T, Brendel, C, Hafer, M, Junghänel, T, Klameth, A, Walawender, E, Weigl, E and Becker, A. 2017. Erstellung einer radargestützten Niederschlagsklimatologie. Deutscher Wetterdienst. https://www.dwd.de/DE/leistungen/pbfb_verlag_berichte/pdf_einzelbaende/251_pdf.pdf?__blob=publicationFile&v=2. 

  32. WMO. 2014. Commission for Climatology, Sixteenth Session, WMO-No. 1137. https://library.wmo.int/doc_num.php?explnum_id=5560. 

  33. WMO. 2018a. Final Report of WMO International Workshop on Information Management (WWIM), 2–4 October 2017, Geneva. https://wis.wmo.int/file=3799 (accessed 19 August 2020). 

  34. WMO. 2018b. Final Report of WMO Expert Meeting on Climate Data Modernisation, 16–18 April, 2018. De Bilt. https://www.wmo.int/pages/prog/wcp/ccl/opace/opace1/meetings/documents/DraftMeetingReport.pdf (accessed 19 August 2020) 

  35. WMO. 2019a. “Resolution 22 (Congress-18)” in “Resolutions and Decisions of Congress and Executive Council”. WMO-No. 508. https://library.wmo.int/index.php?lvl=notice_display&id=15822. 

  36. WMO. 2019b. Manual on the High-quality Global Data Management Framework for Climate. WMO-No. 1238. https://library.wmo.int/index.php?lvl=notice_display&id=21686.