1 Introduction and Background

All climate services, from data provision through seasonal climate forecasting, to the monitoring of, and adaptation to, climate variability and change, as well as disaster risk reduction depend on high quality and well-managed climate data. Among the numerous challenges to the implementation of quality climate services at both the global and national level is that much of the existing guidance on climate data management struggles to keep pace with the rapid advances in technologies, current community best practices and user requirements. Although currently there are opportunities to benefit from these advances, in many cases there is not the capacity to perform good data management, which is further hindered by unstandardized terminology and no suitable regulatory framework. Hence, it is important to ensure that a robust regulatory framework that defines standard and recommended practices and procedures for management of the data exists and is agreed internationally.

The World Meteorological Organization (WMO), as the recognised global authority on weather, water and climate, has sought to address these issues by developing a High Quality Global Data Management Framework for Climate () to enable the effective development of stewardship processes and also the exchange of high-quality climate data, based on a reliable, integrated, underpinning data infrastructure at the global, regional and national levels. The HQ-GDMFC contains the following three components (Figure 1):

Figure 1 

A schematic diagram showing three components of the WMO High-Quality Global Data Management Framework for Climate (HQ-GDMFC).

  1. A Manual on HQ-GDMFC, which establishes the regulatory framework and recommended best practices around climate data management, applicable to all entities with a mandate to manage climate data;
  2. A Stewardship Maturity Matrix for Climate Data (SMM-CD), which is an evaluation tool for quantitatively assessing the level of stewardship of climate data, which may be thought of as the level of capability exhibited in managing climate datasets;
  3. A catalogue of climate datasets, both international datasets and those of individual National Meteorological and Hydrological Services (NMHS) or other entities, that have been through the assessment process.

This is the first time that the climate community has established such a regulatory framework specifically for the management of climate data, although other such frameworks have previously been proposed by collaborative projects (e.g. the CORE CLIMAX maturity matrix []) or used in the national context (e.g. the NOAA data stewardship maturity matrix, []), outlined below.

The HQ-GDMFC promotes establishment of, and compliance with, standards and recommended practices for sourcing, securing, managing, assessing, and cataloguing climate data, and for sharing infrastructure such as for data exchange, analysis and data service provision. The manual describing these standards and recommended practices () was adopted by WMO () as part of the WMO Technical Regulations. As outlined in this HQ-GDMFC manual (), the scope of international collaboration within HQ-GDMFC is based on a set of principles:

  1. Promoting adherence to relevant WMO data policies;
  2. Registering datasets to be shared internationally for use in climate studies, monitoring and applications;
  3. Facilitating easy access to metadata and documentation underpinning the datasets;
  4. Promoting preservation and sound, standards-based management of all data that are used, or may potentially be useful for, climate-change monitoring, including backing up in duplicate repositories for the duration of their specified retention periods;
  5. Assessing and improving the maturity and quality of stewardship practices underpinning the datasets, cataloguing them for easy search, discovery and access, and promoting their use in informing policy-relevant frameworks; and
  6. Promoting acquisition of user feedback on the quality, fitness for purpose and usability of shared datasets.

The focus of this manuscript is the Stewardship Maturity Matrix for Climate Data (SMM-CD) and the subsidiary SMM-CD for National and Regional Purposes (SMM-CD_NRP). We outline herein the creation and use of these Maturity Matrices which assesses aspects of data stewardship. Or, simply put, how well has the dataset been created and curated to ensure the accessibility, usability and integrity of the data, and sufficient documentation for data users. It will necessarily be limited to those facets which can be (independently) assessed. These matrices do not explicitly assess the scientific rigour involved in creating the dataset, e.g., how reliable the underpinning observations are, details of processing, homogenisation, scientifically-based adjustments etc. What they do provide is information on the extent to which the dataset has clear documentation, support channels, is constructed with clear coding practices, applies quality control and assurance procedures, provides uncertainties, and adheres to data format and archiving standards. Datasets fulfilling these criteria may well contain reliable information and be supported and available over a long period. But it is important that users of datasets which have been assessed use this information in combination with other sources to make an appropriate choice for their application.

We outline the rationale behind the construction of a Stewardship Maturity Matrix in Section 2, as well as the process which lies behind the SMM-CD in Section 3. We detail the SMM-CD as well as the SMM-CD_NRP for national and regional purposes in Sections 4 & 5. The datasets assessed so far as well as some case studies are presented in Section 6 and we summarise in Section 7. Links to the current SMM-CD and SMM-CD_NRP documents along with other supporting information are provided at the end of this manuscript.

2 Rationale for Making a Stewardship Maturity Matrix

A data stewardship maturity assessment model in the form of a matrix can be used not only as a guide to users about the rigour of data stewardship practices, but also as a tool for monitoring and improving aspects of organizational performance in producing, managing, or servicing climate data. It is typically presented as a two-dimensional matrix. The rows identify the various facets of core stewardship functionality, (e.g., data management), while the columns describe typical behaviours representing increasing maturity in practices and capability against each aspect, ranging from a poorly-managed or no-capability state to an advanced, well-managed state.

A number of maturity matrices that can be applied to climate data already exist (e.g., NOAA/NCEI Data Stewardship Maturity Model (DSMM) () and the EUMETSAT Core-Climax Production System Maturity Matrix (SMM) () to evaluate the maturity of various data quality attributes. The CORE-CLIMAX SMM measures the maturity of the systems that produce datasets of essential climate variables (ECVs) while DSMM measures the maturity of how digital datasets are being managed within the context of the open archival information systems (OAIS). Both are important to ensure and improve overall quality of climate datasets to users and policy makers. A WMO-developed and supported matrix has the advantage of ensuring a mandate on current best practice, applying to all countries, nations and territories. Through the effort of the International Expert Group on Climate Data Modernisation (IEG-CDM), the WMO has developed and baselined the SMM-CD (), leveraging community best practices and standards captured in those existing maturity assessment models to help ensure and improve the trustworthiness of climate datasets in the WMO data catalogue.

Along with the domain specific approaches presented in the two matrices above, there are other data stewardship principles which have relevance to the SMM-CD. For example, the FAIR (Findable, Accessible, Interoperable and Reusable) Guiding Principles () are fundamental to machine-enabled data sharing. Furthermore, the FAIR Data Maturity Indicators (DMI) endorsed by the Research Data Alliance (RDA) () provides implementation guidance on what indicators to assess for “FAIR-ness”. The TRUST (Transparency, Responsibility, User focus, Sustainability, Technology) principles () describe sustainability and data stewardship requirements for repositories for long term FAIR-ness (collected together at Core Trust Seal – http://www.coretrustseal.org).

The SMM-CD described herein goes beyond assessing only FAIR-ness, by also evaluating the maturity of other aspects of data stewardship practices within the scope of the OAIS Reference Model. As the SMM-CD is domain specific, and has been developed for WMO Member countries, it tries to capture the current stewardship practices applied to individual datasets of Earth Science systems. Providing data stewardship maturity information utilizing the SMM-CD primarily supports the Transparency, Responsibility and User focus aspects of the TRUST principles for data repositories ().

The SMM-CD also focusses on individual datasets, though these may be hosted on repositories which meet the CoreTrustSeal requirements or follow the TRUST principles. An independent effort is underway to examine the synthesis among DSMM, CoreTrustSeal repository requirements, and FAIR data principles () but it is beyond the scope of this paper.

2.1 Using a Stewardship Maturity Matrix

The availability of a WMO-led maturity matrix allows data stewards (e.g. in National Hydrological and Meteorological Services [NHMSs]) to assess their data management practices in an internationally standardised framework, identifying gaps and other elements of their processes that would benefit most from improvement. It also allows the identification of a target level of stewardship maturity for the data they are managing, appropriate to the use-cases and resource level available, as well as a roadmap to measure progress on improving information management capability in support of WMO Programmes.

Once the stewardship maturity of a dataset has been assessed across all aspects, and scores are available, there are a number of different ways both data managers and users of the datasets can use this information. For data managers, having an independent set of assessments across a number of aspects could be useful in identifying where to focus limited resources in improving stewardship quality. There may be some “quick wins” where higher ratings for some aspects require little effort to obtain (and may even be achieved during the assessment process). Furthermore, by contrasting ratings against other similarly well-managed products, the scores from the SMM-CD may even help prioritize cost planning, resource allocation and funding for future data management with the aim of improving stewardship maturity for those datasets. Dataset creators can use the scores similarly when outlining major updates or ensuring stewardship maturity of new datasets.

There are a number of ways data users can or should use the scores from this matrix. At a simple and high level for users with minimal requirements, then the scores can be used to choose the dataset with the highest level of maturity for their specific application. Mature datasets and systems make it easy for users to assess which dataset they need. However, it is highly encouraged that users take a more in-depth approach, thinking about their application as well as the scores for each category and aspect. Datasets which have different aims and processing levels will have different maturity scores, but the appropriate dataset for a particular user’s application may be one with a lower overall score. For example, when studying sea surface temperatures, a user could take one of HadSST3 (, ) or NOAA-ERSSTv5 (, , ), both highly-processed gridded (and in some cases infilled) datasets but neither have been assessed using the SMM-CD (at the time of writing). Alternatively, one could use the raw ship track and buoy information available in ICOADS R3.0 (), which has been assessed using the SMM-CD. However, just because ICOADS has been assessed by the SMM-CD, does not necessarily make it the right choice for this particular application. Furthermore, at a point when both HadSST3 and NOAA-ERSSTv5 have been assessed by the SMM-CD, if their scores are greater than ICOADS, this does not automatically make them the right choice either.

During the construction of the SMM-CD, the differing level of resources available for data managers between countries, institutions and groups were taken into account. The first matrix to be developed focussed initially on datasets which were global in outlook, commonly used internationally, and could be thought of as “high-profile” (SMM-CD, Section 4). Subsequently, an adapted version of the SMM-CD matrix has been constructed, mindful that for regional or national operational products there may be fewer resources available for data management, and it is also likely the data will have subtly different use cases (SMM-CD_NRP, Section 5).

The assessment process for the SMM-CD and SMM-CD_NRP is a voluntary self-evaluation that can be used to evaluate gaps in the management and stewardship of a dataset by the data author or manager. As well as the scores for each aspect, there is a field for the evidence supporting the score. The assessment forms in MS Word are publicly available on Figshare () along with a guidance booklet which will help in the completion of the evaluation (). The assessed datasets can be published in the WMO Catalogue for Climate Data. In this case a cooperative evaluation of the assessment is done for the global dataset ratings. The initial 18 datasets in Table 1 were evaluated by the WMO Expert Team on Data Development and Stewardship (ET-DDS) and some of the data managers, using a template assessment form. Evaluation was discussed with assessment lead and if appropriate rating and comments were updated. Table 1 lists the update date of the assessments after evaluation of the initial assessments. This evaluation process adds to the completeness and quality of the assessments.

Table 1

Details of the datasets which have been assessed by the SMM-CD up to September 2020. Updated list available at WMO Climate Data Catalogue of assessed datasets https://climatedata-catalogue.wmo.int/assessed-datasets.


DOMAINDATASETINSTITUTIONTYPEDATE OF ASSESSMENTWEBPAGE

Surface temperatureNOAAGlobalTemp v4.0.1NOAAmerged land–ocean surface temperature analysis2018-10-15, updated 2019-03-12https://www.ncdc.noaa.gov/data-access/marineocean-data/noaa-global-surface-temperature-noaaglobaltemp, http://dx.doi.org/10.1175/2007JCLI2100.1

HadCRUT.4.6.0.0Met Office Hadley Centregridded dataset2019-03-08, updated 2019-03-21 http://www.metoffice.gov.uk/hadobs/hadcrut4 (v4.5.0.0 also at https://catalogue.ceda.ac.uk/uuid/22a878b3ada24590970974588642f585)

GISTEMP v3NASAsurface temperature analysis2019-03-09, updated 2020-01-21https://data.giss.nasa.gov/gistemp/

PrecipitationGPCC Full Data MonthlyDWDglobally gridded monthly totals2019-02-27, updated 2020-06-18www.doi.org/10.5676/DWD_GPCC/FD_M_V2018_100

Crowdsourcing (Rain, hail & Snow fall)CoCoRaHSColorado State Eduobservations2018-10-07, updated 2019-03-29https://www.cocorahs.org/

Sea levelGLOSSIOCobservations2018-10-17, updated 2019-04-17http://www.gloss-sealevel.org/

CCl-SeaLevelESAsatellite2018-10-26, updated 2019-04-30https://climate.esa.int/en/projects/sea-level/

C3S-SeaLevelCopernicus Climate Change Servicesatellite2018-10-26, updated 2019-04-30 and 2020-08-31https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-sea-level-global?tab=overview

Sea IceSeaIce IndexNSIDCsatellite2018-10-24, updated 2019-04-29https://nsidc.org/data/g02135, https://doi.org/10.7265/N5K072F8

Ice SheetsGLAS-DEM-500mNASA-JPLsatellite2018-10-24, updated 2019-03-11https://nsidc.org/data/nsidc-0304, https://doi.org/10.5067/K2IMI0L24BRJ

GLAS-DEM-1kmNASA-JPLsatellite2018-10-24, updated 2019-03-18https://nsidc.org/data/nsidc-0422, https://doi.org/10.5067/H0FQ1KL9NEKM

Antarctica-GRACENASA-JPLsatellite2018-10-24, updated 2019-03-10https://podaac.jpl.nasa.gov/dataset/ANTARCTICA_MASS_TELLUS_GRACE_MASCON_CRI_TIME_SERIES_RL05_V1, https://doi.org/10.5067/TEMSC-ANTS1

Greenland-GRACENASA-JPLsatellite2018-10-24, updated 2019-04-29https://podaac.jpl.nasa.gov/dataset/GREENLAND_MASS_TELLUS_GRACE_MASCON_CRI_TIME_SERIES_RL05_V1, https://doi.org/10.5067/TEMSC-GRTS1

GlaciersGLIMSGLIMSsatellite2018-10-24, updated 2019-02-24http://www.glims.org/

Climate Extremes IndicesHadEX2Met Office Hadley Centreobservations and model data2018-05-07, updated 2019-06-16www.climdex.org, https://doi.org/10.1002/jgrd.50150

HydrologyGRDCBundesanstalt fuer Gewaesserkundeobservations2018-10-05, updated 2019-03-19https://www.bafg.de/GRDC/EN/01_GRDC/grdc_node.html

MarineWOD13NOAA and IODEobservations2018-09-12, updated 2019-03-29https://www.nodc.noaa.gov/OC5/WOD/pr_wod.html, https://www.ncei.noaa.gov/products/world-ocean-database

ICOADSNOAAsimple gridded monthly summary products2018-11-02, updated 2019-03-27https://icoads.noaa.gov/, http://dx.doi.org/10.1002/joc.4775

3 Development Process

The development of the SMM-CD and SMM-CD_NRP stems from the outcomes of two meetings of international domain experts with the focus on information management and climate data modernisation. We give details of these meetings, and the subsequent development to demonstrate the provenance of these matrices and the high level of consultation and discussion that formed part of their construction.

The first, the “WMO Workshop on Information Management” () held in Geneva in October 2017, was to develop WMO-wide guidance on information management as well as make progress to identifying datasets with good data management as well as enhance their accessibility and visibility on the WMO Information System (WIS). Those who attended represented a wide range of institutions, WMO Member states and covered many specialisms. The workshop recommended that the WMO Commission for Climatology (CCl) develop a catalogue of datasets based on best practices in current maturity models (e.g. for use in the monitoring of key climate indicators) and also provide non-technical users with improved access to data. Furthermore, an Inter-Programme Expert Team on the Climate Data Modernization Programme (IPET-CDMP) was tasked with managing the High-Quality Global Data Management Framework for Climate (HQ-GDMFC). It was recognised that in order to determine the maturity of climate datasets, they would need to be assessed against a maturity index in a process agreed by the WMO.

The second, the “WMO Expert Meeting on Climate Data Modernisation”, () held at KNMI in April 2018 was to develop a WMO-wide SMM-CD based on existing US and European maturity matrix models (as outlined above). Datasets assessed through this matrix would form part of a WMO Climate Data Catalogue, with discovery and access protocols for the WIS and search engines to assist non-technical users also scoped and developed at this workshop. The subject matter specialists discussed and refined the contents of the matrix as well as which datasets were to be used for real-world testing.

Further refinements and updates to the SMM-CD were put in place over the following two years, taking into account feedback from data managers, dataset developers and subject matter experts from the WMO ET-DDS and the International Expert Group on Climate Data Modernisation (IEG-CDM). In June of 2019, the Manual on HQ-GDMFC, the SMM-CD and the WMO Climate Data Catalogue were endorsed at the 18th WMO Congress.

4 The Stewardship Maturity Matrix for Climate Data (SMM-CD)

The SMM-CD has been developed with the intention of it being used as a self-assessment tool, with some external moderation and guidance. Full details about the SMM-CD are given in the Guidance Document () as well as on the Assessment Template (see Resources), but we give an overview here.

There are four categories in the SMM-CD, under each of which lie two or three aspects (Figure 2). A score is determined for each aspect, corresponding to the maturity scales outlined in Figure 3. In the Guidance Document more detailed examples are given for most of the aspects at each maturity level to help with the self assessment process.

Figure 2 

Diagram of SMM-CD Categories and Aspects. Based on Fig. 2 in Peng et al. ().

Figure 3 

The maturity scale structure for the WMO SMM-CD. Based on Fig. 1 in Peng et al. ().

The maturity level starts where there are few or no procedures and processes defined or in place, or that they are not reported or (poorly) documented. This “ad-hoc” level could be, for example, an individual researcher creating and storing files locally on their own hard disc. At higher levels, increasing levels of managed and supported processes are in place across the aspects, through to an optimal level of stewardship maturity whereby many processes are demonstrably compliant with international standards.

Throughout the SMM-CD, WMO-defined requirements and standards are recommended where they are applicable. The ratings for a data product should be assessed at that level where all the descriptors in the current and lower levels are satisfied. In some cases, a fraction may be used to indicate that one or more criteria may be satisfied at a level higher than the current. All assessment ratings come with supporting information to justify the level scored. So far, 18 highly utilized global climate dataset have been assessed (Table 1). The assessment results were reviewed by the members of the WMO ET-DDS.

It should be noted that dataset maturity ratings are a snapshot of the current state which may evolve over time. The requirements or standard against which the maturity of a dataset is evaluated should be described in the assessment report prepared by the dataset point-of-contact or an evaluator.

4.1 Data Access

This category refers to the ability of users to find and then obtain the dataset with higher levels reflecting the ease to which this is possible. At the lowest levels, personal contact is required to both know about and then receive the data, whereas at the highest levels, datasets are available through international catalogues and portals with the ability to restrict the data downloaded to just the fields of interest.

  • Discoverability – how easy it is to find the data
  • Accessibility – how easy it is to obtain the data

4.2 Usability and Usage

This category refers both to how easily the data product can be used by users and also how impactful uses of the data product have been at the time of assessment. There are two aspects to the usability category; data portability and documentation; both of which are necessary for a dataset to be usable. At the higher levels, data products would be available in a number of standard formats with documentation extending to tutorials and even the complete production system being available.

The inclusion of the impact in this category is helpful to users as datasets which have attracted lots of citations or have been used in assessment reports may indicate a high level of maturity of this product in comparison to other, less widely used, products. However it will also be likely that recently released (updates to existing) datasets have low citation levels, and furthermore, that different disciplines have varying levels of citations.

  • Data Portability – ranges from inability to transfer data in computerized form to fully machine-readable and interoperable.
  • Documentation – assesses the extent and accessibility of information on how to use the dataset or product, and hence users’ ability to determine its fitness for purpose.
  • Usage & Impact – where the dataset has been used, and the relative high-profile and impactful nature of these.

4.3 Quality Management

The quality management category separates quality control (QC) and quality assurance (QA) from other aspects, such as quality assessment and data integrity. In the case of the SMM-CD, QC comprises the set of routines and checks run by the dataset creators as an integral part of the construction of the datasets. Separately, QA is the set of checks and processes in place to ensure the construction method is robust, for example code review, version control etc. In contrast, quality assessments are sets of analyses which make (quasi-) independent investigations into, e.g., dataset uncertainty source.

The data integrity aspect reflects tools available on the download pages and access portals to ensure that the data requested are the data received.

  • Quality Assurance and Control Procedure – level and accessibility of the QA and QC procedures.
  • Quality Assessment – separate assessment of dataset quality and limitations
  • Data Integrity – monitors dataflow and ingest processes to ensure data requested is data received.

4.4 Data Management

This category takes an overview look at the dataset, and assesses the procedures, protocols and policies that exist to ensure the data product has sufficient longevity to be useful. Lower scores indicate there is a greater risk that the dataset could be unusable or lost, whereas if they are higher, then this risk is less.

Unsurprisingly, the preservation aspect assesses how well the data products are archived in case of system failures, personnel changes and the like. Although some metadata facets are assessed in the Usability and Usage category, and in some cases and contexts also the Discoverability aspect, there are other types which are also important for example, the provenance of the dataset. Here metadata relating to the data collection (e.g. a dataset consisting of a number of files or quantities) down to the granule-level (the smallest manageable piece e.g. a file within a dataset, a weather station history) are important.

Finally, dataset governance structures ensure that processes such as read-write permissions for the creation and release of datasets have been adhered to, indicating a formal approach rather than more ad-hoc efforts.

  • Preservation – assesses the security of the data such as backup procedures and retention policies.
  • Metadata – covers how detailed the descriptive metadata are, online availability of this information, and extent of compliance with international standards. It is also important for users to have up to date contact information for the dataset.
  • Governance – refers to the extent to which controls, accountabilities and compliance mechanisms are put in place, and their adherence with community best-practice.

5 The Stewardship Maturity Matrix for Climate Data for National and Regional Purposes (SMM-CD_NRP)

The rationale for developing a version of the SMM-CD for national and regional datasets is to address the operational focus of data management at the NMHSs. Their primary mission is to make national and regional datasets available to users. The SMM-CD_NRP therefore retains two main categories: Operational Data Management and Data Stewardship. These best inform the NMHSs on how to manage their data according to best management practices and standards at the national and regional levels. As with the SMM-CD, the goal is to provide NMHSs with a user-friendly self-assessment tool to help determine gaps in their data management and stewardship, and provide a structure to move towards improved practices and standards to attain a satisfactory level of competence in data management and stewardship.

The SMM-CD_NRP was heavily influenced by the structure of the SMM-CD, but with some simplification and merging of the categories, aspects and maturity levels to ensure they are more appropriate for national and regional purposes. In this section we outline differences of the SMM-CD_NRP to the SMM-CD. Full details on the SMM-CD_NRP are given in its Guidance Booklet and the Assessment Template (see Resources).

In Figure 4 below, the structure of the maturity scale of the SMM-CD_NRP is presented. In contrast to the SMM-CD, there are only three main levels, with an optional fourth “highly desirable”. Although the progression of increasing stewardship maturity remains, the target user of the SMM-CD_NRP entails a reduction in the level of stewardship maturity required for each level.

Figure 4 

The maturity scale structure for the WMO SMM-CD_NRP.

The two categories of the SMM-CD_NRP both have a number of aspects, as outlined in Figure 5. In the Operational Data Management category there are five assessment aspects: Data Access, Data Portability, Data Preservation, Documentation and Data Integrity; and in the Data Stewardship category there are three aspects: Quality and Usage, Governance, and Metadata. Some of the aspects are very similar to those in the SMM-CD, whereas others are blends of two or more. Below we only go into detail for the few aspects which are different or unique in the SMM-CD_NRP.

Figure 5 

Diagram of SMM-CD_NRP Categories and Aspects.

5.1 Operational Data Management

This category addresses the assessment of operations that are required to enable access, portability, archival, documentation, and ensuring data integrity in the NMHS data holdings. Most aspects under this category are very similar to those in the SMM-CD. The aspect that has the greatest difference is Data access, which assesses the extent to which a dataset can be found and accessed. A single aspect is used in the SMM-CD_NRP, blending the two aspects (Discoverability and Accessibility) under the Data Access category in the SMM-CD

5.2 Data Stewardship

The second category of the SMM-CD_NRP provides a rating on how well a dataset is stewarded, by assessing the quality and usage, governance, and metadata aspects. These last two again are very similar to those in the SMM-CD. However, here the Quality Assurance and Control, Quality Assessment as well as Usage aspects have been blended together. This Quality and Usage aspect assesses the degree to which robust quality control is carried out on the data, along with quality flagging or error estimates, and the extent to which scientific peers trust the data in conducting research and compiling reports.

6 Applying the SMM-CD/_NRP – Case Studies

We show in Tables 1 and 3 the lists of datasets that have been assessed so far (30th September 2020) under the SMM-CD and SMM-CD_NRP respectively. A complete and up-to-date list is kept on the WMO Climate Data Catalogue (https://climatedata-catalogue.wmo.int/assessed-datasets). It is the aspiration and intention that over time more datasets from a greater range of domains and platforms which are of relevance to the community that WMO serves will be assessed. This is within the remit of the newly formed WMO Services Commission (SERCOM) Expert Team on Data Requirements for Climate (ET-DRC) Services. The assessment procedure is a living process and so as datasets are updated, the ratings are likely to change. We now outline in more detail a number of case-studies showing how the matrices can be applied in practice.

6.1 Copernicus Climate Change Service (C3S) Sea Level Dataset (SMM-CD V4)

The European Union’s Copernicus programme aims at providing environmental observations of the Earth system for the ultimate benefit of all European citizens. The mission of the Copernicus Climate Change Service (C3S) is to provide consistent and authoritative information about climate change. The sea level Essential Climate Variable (ECV) product is of interest since sea level rise is one of the major consequences of climate change. Hence, it is essential to monitor the sea level changes observed on a global and regional scale. In this context, a sea level dataset based on satellite altimetry is available to users through the C3S Climate Data Store (CDS, https://cds.climate.copernicus.eu/). This daily, multi-mission merged, gridded dataset of sea level anomalies has been designed to ensure stability and homogeneity of the time series. It starts in 1993 and will be extended three times per year.

The data provider and several ET-DDS members have used the SMM-CD to assess the data management of this Climate Data Record (summarised in Table 2). In terms of ‘Data Access’, the grade is 5 for the ‘Discoverability’ aspect since the dataset is searchable and easily available through the online institutional C3S catalogue and 4.5 for ‘Accessibility’ due to the CDS interface and the associated toolbox (no spatial sub setting is possible when downloading the data and all variables have to be downloaded together). Regarding the ‘Usability and Usage’, the attributed grade for ‘Data Portability’ is 4.5 since the data are distributed as NetCDF (Network Common Data Form; ) files, compliant with the Climate and Forecast (CF) Metadata Conventions (https://cfconventions.org/), but no other format is available. A grade of 5 is given for ‘Documentation’ and ‘Usage’ aspects since the dataset is fully documented and has been referenced in international climate assessment and published reports. In terms of ‘Quality Management’, the quality assurance procedure is fully documented with an additional independent evaluation and quality control (EQC, ) performed by the Copernicus service. Target requirements and a detailed gap analysis are available, and details of the error budget have been published in peer-reviewed journals, leading to a grade of 5 for both ‘Quality Assurance and Assessment’ aspects. The same grade of 5 is given for ‘Data Integrity’, which is systematically verified with a standard approach to ensure that the distributed data are the same as the initial data files. Finally, regarding the ‘Data Management’ category, a grade of 4 is attributed for ‘Data Preservation’ since the data are distributed on an institutionally maintained platform and are archived following a defined and implemented procedure which agrees with community standards. A grade of 5 is given for both ‘Metadata’ and ‘Governance’ aspects since the dataset is distributed with comprehensive metadata, detailed documentation and versioning system, and governance aspects are well-defined within the E.U. Copernicus programme and are compliant with international standards.

Table 2

The scores and evidence for each aspect from the Copernicus Climate Change Service (C3S) Sea level dataset assessed by the SMM-CD. These have been taken from the assessment document available through the WMO Data Catalogue.


CATEGORYASPECTSCORE ACHIEVEDSCORE EVIDENCE

Data AccessDiscoverability5Dataset is discoverable in the C3S online searchable Climate Data Store (CDS, https://cds.climate.copernicus.eu/) including overview and metadata description. Operational production is maintained, and temporal extensions are routinely provided. Procedures for data integration in the catalogue are defined and applied.

Accessibility4.5Data is available through the institutional C3S CDS web interface with the possibility to select the period of interest.
However, no spatial sub-setting is possible. All variables available must be downloaded together and data are made available on an ftp-based pull-mode access.
Visualization is possible through the CDS toolbox (https://cds.climate.copernicus.eu/cdsapp#!/toolbox)

Usability & UsageData Portability4.5Data format is NetCDF-4 and follow Climate-Forecast (CF) conventions. The CDS toolbox is available and allows further processing and customization of the data by the users

Documentation5Documentation based on a standard C3S template is available online with a unique ID and version number.
The production system is fully described in the documentation. Altimetry tutorials are available online (http://www.altimetry.info/) and use cases produced with the C3S toolbox are in the process of being published.

Usage and Impact5Sea level rise is a direct consequence of climate change and thus, the altimeter sea level time series is cited in numerous peer-reviewed publications (CMEMS OSR#4: https://marine.copernicus.eu/wp-content/uploads/2020/06/OSR4_Summary_WEB_SinglePages.pdf), in institutional reports (C3S European State of the Climate Report: https://climate.copernicus.eu/ESOTC), international climate assessment reports (ESA SL_CCI http://www.esa-sealevel-cci.org/webfm_send/584, IPCC SROCC 2019: www.ipcc.ch/srocc/) and is also used in policy-making process (WMO State of the Global Climate report: https://library.wmo.int/doc_num.php?explnum_id=10211).

Quality ManagementQuality Assurance and Control Procedure5QA/QC procedures are fully documented and applied to the full historical record and to the regular temporal extensions. Estimated accuracy numbers are available, derived from published studies of error characterization. The C3S EQC component aims at informing the users about the fitness for purpose of the datasets with an independent approach.
Target requirements and gap analysis are available, and a dedicated user service desk considers user feedback. (https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-sea-level-global?tab=doc).

Quality Assessment5Product quality assurance procedure and assessment report are available in the data documentation.
The dataset is produced and distributed within the European C3S.
Detailed error budget has been produced, leading to uncertainty characterization and results have been published in peer-reviewed journal (; ; ; ).

Data Integrity5The copy of data files from the production server to the diffusion platform is made using the “rsync” Unix command which includes a ‘checksum’ verification step. Data integrity is thus systematically verified with a standard approach to ensure that data received, archived and disseminated are conform to the initial data files.

Data ManagementPreservation4The C3S sea level data are distributed on an institutionally maintained platform. The architecture of the equipment required for the production, diffusion and backup systems is defined and described in the public technical documentation.
The diffusion server consists in a main server and a redundant one (hosted in a backup separated data centre). The data are systematically stored, saved and archived using secured internal repository, following defined and implemented procedure which is conform to community standards.

Metadata5The metadata available in each data file are compliant with international standards and support dataset provenance. Metadata is updated following each evolution of the input data.
The input data are described in the Product User Guide and Specification (http://datastore.copernicus-climate.eu/documents/satellite-sea-level/D3.SL.1-v1.2_PUGS_of_v1DT2018_SeaLevel_products_v2.4.pdf) so that data product can be linked to the version of the data from which it was derived.

Governance5The responsibility of the data production is clearly defined within the E.U. C3S. Point of contact is clearly defined.
The entity in charge of the management of the data production and delivery service is audited annually.

Table 3

Details of the datasets which have been assessed by the SMM-CD_NRP up to September 2020.


DOMAINDATASETINSTITUTIONTYPEDATE OF ASSESSMENTWEBPAGENOTES

BrazilTemperature, Precipitation and HumidityINMETAutomatic and Manual Weather Stations24.09.2020https://bdmep.inmet.gov.brThe annual number of stations varies according to the year of implementation of automatic stations and removal of conventional stations

CanadaDaily maximum and minimum temperatures, monthly mean temperatureEnvironment and Climate Change CanadaHomogenized data time series using observations from manual and automatic stations11.08.2020Daily data at http://crd-data-donnees-rdc.ec.gc.ca/CDAS/products/EC_data/AHCCD_daily/ and monthly data at ftp://ccrp.tor.ec.gc.ca/pub/AHCCD/

GermanyQuWind100DWDmodel data20.08.2020https://www.dwd.de/DE/leistungen/quwind100/qu-wind_100.htmlDownload at https://opendata.dwd.de/climate_environment/CDC/grids_germany/multi_annual/wind_parameters/Project_QuWind100/

GermanyRadar-based Precipitation Climatology for GermanyDWDremote sensing data20.08.2020https://opendata.dwd.de/climate_environment/GPCC/radarklimatologie/Download at https://www.doi.org/10.5676/DWD/RADKLIM_RW_V2017.002

FranceDRIAS 2020 Climate simulations corrected over Metropolitan FranceMéteo-FranceClimate simulation04.09.2020http://www.drias-climat.fr/

6.2 National Precipitation and Wind Speed Dataset for Germany (SMM-CD_NRP)

Deutscher Wetterdienst (DWD) is the national meteorological service of Germany. To fulfil its legal tasks, it operates among other things an in-situ station network as well as a radar network for precipitation monitoring, which is currently based on 17 devices. DWD operates also a suite of numerical weather prediction (NWP) models.

6.2.1 Radar-based Precipitation Climatology for Germany

The DWD radar data are stored, which allows a climatological reprocessing of this very high-resolution observational data set (up to 1 km spatial and 5 minutes temporal resolution). A summary of the production steps for this dataset is as follows. The radar reflectivity was automatic quality controlled to remove/correct, for example, clutters and spikes. The next step was the adjustment of the radar data to the quality-controlled rain gauges, where this procedure includes a verification step and improvement of the adjustment by a subset of the gauges, which were not part of the original adjustment (). The data set was developed within the research project RADKLIM () and has now transferred to an operational data product. The driver is the assumption that the majority of heavy precipitation events were missed by gauges, now confirmed by Lengfeld et al., (). This data set starts in 2001 and is extended by the observations of the previous year on an annual basis.

The Radar-based Precipitation Climatology for Germany was assessed by a member of the IEG-CDM in consultation with the dataset creators. The overall rating for the ‘Operational Data Management’ category is level 2.4, and level 2.3 in ‘Data Stewardship’. Regarding the individual aspects, ‘Data Access’ has level 2 as it is distributed via the national climate data portal via sftp/https, downloadable in monthly chunks. As it is available as binary and ASCII file with GIS-header and manual, ‘Data Portability’ has level 3, with ‘Data Preservation’ at the same level, as the backup procedure follows institutional archiving practices. The ‘Documentation’ aspect also reached level 3, as the dataset was created within a collaborative research project and interim and final reports as well as papers describe the creation of the dataset. ‘Data Integrity’ has level 1, as no checks about the integrity of files were applied during the copy process. The aspect ‘Quality and Usage’ gain level 3 as the data were quality controlled by means of a defined and reviewed QC-procedure. The aspects ‘Governance’ and ‘Metadata’ achieve both level 2 as the dataset follows standard procedures along with accountabilities and user contact information is provided as well as the coordinates and projection of the grid.

6.2.2 QuWind100

Germany is striving to increase its amount of renewable energy capacity, especially the ratio of photovoltaics and wind energy (). Amongst other datasets developed for this sector, DWD developed QuWind100, a dataset for mean wind speed at 100 and 200 metres above ground (the typical height of wind turbine hubs). With this dataset, suitable locations for wind turbines can be defined as well as expected yields estimated. A reference period of 1981 to 2010 was chosen and projections over 2021 to 2050 estimated based on the RCP8.5 emissions scenario. An innovative model chain was developed to close the gap between the available in-situ and needed data by combination of the mesoscale model COSMO-CLM and the boundary layer model HIRVAC2D (). The data set was developed in a collaborative research project of DWD and the Technische Universität Dresden.

QuWind100 was also assessed by a member of the IEG-CDM in consultation with the dataset creators. The rating for the ‘Operational Data Management’ category is level 3 and 2.3 in the ‘Data Stewardship’ category. Within the aspects, ‘Data Access’ and ‘’Data Portability’ gain level 4, as the data are available via the German Climate Data Centre in the OpenData portal and also via web services as NetCDF and comma-separated value (CSV)-files for the whole country as well as user-defined subregions, in seasonal and annual chunks. Level 3 is reached by the aspects ‘Data Preservation’ and ‘Documentation’, as a back-up copy is held following institutional practices and a final report of the research project including a verification is available. As no systematic data integrity checks were done, the aspects ‘Data Integrity’ achieve level 1. The ‘Quality and Usage’ aspect reaches level 3 as input data from stations operated by DWD run through a defined, documented and audited QC routine. The two aspects ‘Governance’ and ‘Metadata’ gain level 2 as basic information like the latitude and longitude of the grid cells as well as the elevation above ground are provided, accountabilities are defined and user contact information is given.

6.3 Third Generation of Homogenized Temperature Datasets (Monthly and Daily) for Canada (SMM-CD_NRP)

The Climate Research Division of Environment and Climate Change Canada has produced the third generation of homogenized (surface air) temperature datasets (). These include homogenized versions of both daily minimum and maximum temperatures, as well as the derived monthly means, for 780 locations across Canada, the majority of which are currently active stations of long data records (). This is a research data product suitable for climate trend analysis. The production procedure includes quality control, adjusting daily minimum temperatures to diminish the effects of the change in observing time in 1961, joining of data records from nearby sites to form a long data time series, homogeneity testing of each time series of annual and seasonal mean temperatures, and homogenization of each daily and monthly data series by adjusting the raw data series to diminish all the non-climatic shifts identified for that raw data series ().

The data stewardship assessment with the SMM-CD_NRP was made by the data producer team, one of whom is an ET-DDS member. The averaged rating level is 2.4 for Operational Data Management, and 2.3+ for Data Stewardship, with 3.0 being the maximum rate of the mandatory maturity criteria. Specifically, the Data Access, Portability, and Integrity aspects received Level 2 rating, because (1) the datasets are available via ftp as a whole without enhanced online data services (e.g., there is no option to download a subset of the dataset); (2) the data and metadata are separate in ASCII format, not directly usable in a geospatial environment such as ArcGIS; and (3) the data were produced with random data integrity checks. The Data Preservation and Documentation aspects received Level 3 rating, because (1) both the raw and homogenized data, as well as the computer programs to produce the datasets, are routinely backed up on several servers; and (2) documentation on the methods to produce the datasets (including the related published journal papers) and on data format and filename convention is available online. The Quality and Usage aspect of Data Stewardship received level 3+ rating, because these were produced with comprehensive QA/QC and homogenization procedures, and have been well cited in peer-reviewed journals and used by other well-known climate data centers. The Governance and Metadata aspects received Level 2 rating, because (1) standard procedure and approval process was followed, along with accountabilities and compliance mechanism for ensuring that the data are secure, accessible, and useable; and (2) limited collection-level metadata are available for the users but are not conforming to community standards.

6.4 Fundamental Temperature, Precipitation and Humidity Over Brazil (SMM-CD_NRP)

The National Meteorological Service of Brazil, hereafter INMET, has provided the Brazilian Climate data (Brazilian-CD) online and freely available as text CSV to the public. The Brazilian-CD comprises data from all observing weather stations operated and maintained by INMET, either automatic or conventional stations, with the total number varying according to year, from about 400 in 2000 to 834 stations in 2020. The WMO SMM-CD_NRP was used to assess the temperature, humidity and precipitation data of the Brazilian-CD. The averaged stewardship maturity rating levels for these categories are 2.5 and 1.7 respectively. The assessment was originally made by an expert from INMET, which was moderated by one member of the WMO ET-DDS team.

The online publication of the Brazilian CD meets important criteria for publicity, such as data access, portability and preservation, in the operational data management category, but also in this category there is room for improving the documentation and data integrity, due to lack of information, or even application, of such aspects in the web site. Regarding the data stewardship category, it was noticed that no quality assurance or quality control procedures are informed in the online documentation of this CD, but a further communication to the assessment point-of-contact (POC) had clarified that the dataset is under a routine procedure of quality control. Furthermore, despite the CD being provided by INMET, no explicit information on governance or POC is given, which lowers the score for this aspect. The Brazilian CD is available online with minimal metadata information, regarding the observing station location, altitude and period of operation. Besides, climate normal parameters can be found as figures and table, by simply consulting the map of the station, though such information is not integrated in the provided CSV downloaded file. Nonetheless, substantial progress has been made recently by INMET for improving the informational content of the Brazilian Climate data to users, as nationally reported and also referred by relevant centres as NOAA CPC and IRI.

6.5 Drias 2020 – National Climate Simulations Over France (SMM-CD_NRP)

The new DRIAS 2020 dataset provides high resolution bias-adjusted climate projections over France in a variety of graphical or numerical forms.

Through the EUROCORDEX initiative (), regional climate models were implemented on a limited area domain covering Europe at horizontal resolution of 12 km. The regional climate simulations were dynamically downscaled from global climate projections of the Coupled Model Intercomparison Project – Phase 5 (). DRIAS service has identified a consistent subset from this EUROCORDEX climate projections ensemble, which is more manageable for decision-making and impact studies, while preserving the range and characteristics of regional responses in metropolitan France. All simulations were produced for the RCP4.5 and RCP8.5 emission scenarios and eight simulations for the RCP 2.6. They are available over the period 1971-2100 (1971-2005 being the historical reference part of the simulations and 2006-2100 the climate projection part). Then, the ADAMONT () bias adjustment method has been applied over France to EUROCORDEX climate model outputs using the SAFRAN () reanalysis data as reference. The resulting simulations are available at a daily time step on a 8 × 8 km horizontal grid and form the new DRIAS 2020 dataset.

Aspects in the Operational Data Management category generally present a high level of maturity: the Data Access, Portability, Preservation and Documentation aspects received Level 3 rating. Data can be accessed online after a quick registration procedure, a catalogue of data and products is available, and the user can select and download a subset of the dataset, in ASCII or NetCDF formats. For the preservation aspect, there is a multi-site backup system; the Météo-France archive and CNRM ESGF archive system. A great effort has been put into documentation, with devoted “Education” and “Discover” sections. Data Integrity is an aspect that can be improved, as there is no online documentation of data integrity checks.

Aspects on data stewardship are evaluated between levels 2 and 3. Quality and usage are at 2.5, data quality is assessed but no specific documentation on this aspect is available. However, the national report “The climate of France in the 21st century” is based on the DRIAS dataset and products demonstrating its high profile. As for governance, DRIAS is a multi-partnership endeavor between Direction of Climatology at Météo-France (coordination and services implementation) and the main French organizations involved in climate modeling: IPSL, CERFACS, CNRM-GAME, which, in addition to climate simulations, bring their scientific expertise on how to use tools and interpret results. As user support is also provided, Metadata aspects is level 2 for ASCII data but Level 3 for NetCDF data following CMOR standard.

7 Summary

We have described the development of a structured and standards-based method of informing both data users and data managers of climate datasets about recommended stewardship practices, as well as providing an assessment process. The Stewardship Maturity Matrix for Climate Data (SMM-CD), which has now been approved by the WMO Congress, is a self-assessment tool to score the maturity of individual data products that are global in scope. There are four categories (data access, usability and usage, quality management, and data management) each with a number of aspects, which are assessed to be at one of five levels. Datasets which have been assessed by the SMM-CD are collected in the WMO Climate Data Catalogue, which will grow over time.

Furthermore, a matrix for national and regional purposes (SMM-CD_NRP) has been derived from the SMM-CD, by blending the categories and aspects resulting in a smaller set (operational data management and data stewardship). The SMM-CD_NRP is for data products which are produced more operationally and have a smaller geographic range. These are more likely to be produced by NHMSs and have a lower processing level and hence be closer to the basic observations and climate data record. We also present a number of case studies where these matrices have been applied.

The SMM-CD_NRP has already shown to be of great value in assessing the way that climate datasets are provided. Although standardization is not the goal of its implementation, the results from the case studies presented herein have shown differences in the averaged scores in both categories, differences being slightly more evident in data stewardship practices. Contrasting the case studies, we noticed that evidence of important aspects, such as quality assurance, quality control and governance, is not easily available for some datasets. In addition, not all NetCDF datasets follow the CF compliance requirements for metadata. Nonetheless, we notice also that, at this first implementation stage of the matrix, the climate datasets being evaluated have good data stewardship practices which will give confidence to their users, as they scored very close to the highest mandatory level in most aspects. This is already an important achievement of the SMM-CD_NRP, by which providers can use the information to improve their data services.

The WMO manual on HQ-GDMFC will incorporate this matrix and be distributed to all Member states and their NHMSs. In addition, this manuscript, the matrix and its guidance notes will be a reference for data managers and others managing and producing climate data, for the application of the SMM–CDs in monitoring climate change and climate services. As any other set of standards, this manual is subject to updates and amendments as needs arise and the requirements evolve. The WMO Standing Committee on Climate Services has this in its mandate.

Going forward, we envisage that assessing against these maturity matrices becomes a regular part of dataset development and release, both in climate as well as forecasting services. The new WMO SERCOM ET-DRC Services will continue the work of the ET-DDS and IEG-CDM, by adding more datasets at the global, regional and national level to the WMO Climate Data Catalogue, and updating those already included where necessary. This catalogue is to become part of the new WMO Climate Data Portal in the near future.