ECDS – A SWEDISH RESEARCH INFRASTRUCTURE FOR THE OPEN SHARING OF ENVIRONMENT AND CLIMATE DATA

Environment Climate Data Sweden (ECDS) is a new Swedish research infrastructure, furthering the reuse of scientific data in the domains of environment and climate. ECDS consists of a technical infrastructure and a service organization, supporting the management, exchange, and re-use of scientific data. The technical components of ECDS include a portal and an underlying data catalogue with information on datasets. The datasets are described using a metadata profile compliant with international standards. The datasets accessible through ECDS can be hosted by universities, institutes, or research groups or at the new Swedish federated data storage facility Swestore of the Swedish National Infrastructure for Computing (SNIC).


INTRODUCTION
The last decades have seen a substantial increase both in the production of and the need for high quality scientific data.This is particularly evident in Earth system sciences, where the complexity of models is increasing through the inclusion of new processes, interactions, and more accurate descriptions of the different components of the Earth system.The rapidly increasing amount of generated scientific data increases demands on scientists to properly document, store, share, and integrate these data.Easy data discovery, interoperability, and open data access are important prerequisites for efficient re-use of data.Following the earlier paradigms of experimental, theoretical, and computational science, data-intensive science has emerged as a fourth paradigm, encompassing new technologies for the analysis, visualization, and exploration of huge amounts of data (e.g., Hey et al., 2009).With the growing complexity of international eResearch collaborations, there follows a need to implement appropriate national data sharing policies, legal frameworks, and data management practices (Fitzgerald et al., 2009).
Data sharing has been a central theme of many international initiatives both from a technical and data policy perspective.Selected recent benchmarks include the data sharing principles of the Global Earth Observation System of Systems (GEOSS) implemented by the Group on Earth Observation (GEO, 2012), the International Polar Year (IPY) 2007-2008(IPY, 2012)), and the European INSPIRE directive (INSPIRE, 2012).More detailed discussions on this topic can be found in existing reviews (e.g., Uhlir et al., 2009).Despite numerous efforts, there are still substantial issues hampering the reuse of scientifically relevant data (e.g., Nelson, 2009).Regarding data sharing in Sweden, a recent investigation on the status of sharing of environment and climate data in Sweden (Eklundh, 2008) confirmed many of the problems.In summary, scientists face three types of challenges: lack of capabilities for the discovery and efficient sharing of data, problems with the use of data (e.g.lack of interoperability, standardized formats, and tools), and general access problems (reluctance of data producers to share data).Researchers experience difficulties with access to data produced by both scientists and agencies operating under the requirement of cost recovery.The discussion of the latter is beyond the scope of this article (see, e.g., Uhlir et al., 2009).
Enhancing the sharing of scientific data requires incentives for scientists.Such incentives can range from explicit requirements for open data access by research funding agencies to opportunities for new research (OECD, 2007) and increased citation records for data sharers.Scientific publishers put increasing weight on open data access, and a new paradigm of scientific publishing is being established by scientific journals that focus particularly on data collection rather than data interpretation and analysis (e.g., ESSD, 2012).The latter helps to integrate scientific data sharing as a natural element in the scientific process with direct benefits for the individual scientific career (Carlson, 2011).
In addition to increased incentives for scientists, enhanced reuse of high quality environment and climate data depends on infrastructures providing practical support to scientists for data discovery, integration, management, and sharing.Environment Climate Data Sweden (ECDS) is a Swedish infrastructure that was created for that purpose.This article provides a brief description of the ECDS initiative, the services it provides for the scientific community and selected first achievements, challenges, and further development plans.

Infrastructure design
Environment Climate Data Sweden (ECDS) is a new Swedish research infrastructure facilitating the search, publication, and long-term accessibility of data for environment and climate research.ECDS consists of a clearinghouse mechanism, enabling the search and publication of relevant data, and a service organization, providing additional support to scientists throughout the whole research process.ECDS is a joint undertaking of the Swedish Research Council (SRC) and the Swedish Meteorological and Hydrological Institute (SMHI), in collaboration with the National Supercomputer Centre (NSC) of Linköping University.The initial funding period for ECDS extends from 2009 until 2013, after which the infrastructure is expected to be operational with the functionalities and user services described below.Technically ECDS consists of a website, a data portal and the external data storage facility Swestore.The website (ECDS, 2012) provides information and support material to both experienced and new users and represents the main entry point to the ECDS data portal.The data portal (Figure 1) uses the open-source software GeoNetwork (GeoNetwork, 2012) and a PostgreSQL database for the management of standardized dataset descriptions.GeoNetwork also provides the user interface for search and publication of data.The ECDS data documentation standard (metadata profile) is a subset of the internationally widespread standard ISO19115:2003(ISO, 2003) and is referred to as "ISO 19115:2003 ECDS".The ECDS metadata profile has recently been extended to include mandatory metadata elements for geospatial data according to the European INSPIRE directive (INSPIRE, 2012).The ECDS data resource descriptions are clustered into thematic categories with the help of Global Change Master Directory (GCMD) Earth Science Keywords (Olsen et al., 2007).The use of well-documented and widespread standards allows for the efficient exchange of metadata with other portals and thus supports the establishment of federated collective global resources for scientific data, such as GEOSS.Already, ECDS is a registered component of GEOSS with documented technical interfaces for metadata exchange.
As a complement to its data discovery service, ECDS is also working on the establishment of a Swedish data repository for environment and climate data.The ECDS data storage capacity is implemented in collaboration with the Swestore initiative (Swestore, 2012) of the Swedish National Infrastructure for Computing (SNIC).To the user, Swestore appears as one single system although parts of its capabilities are distributed among the six national high performance computing centres participating in SNIC.The system is flexible and expandable and intended as a versatile long-term storage1 facility for users from many scientific domains.At present, ECDS uses only a small storage allocation of 10 terabytes at Swestore.That storage allocation is primarily intended to support users who do not have easy access to alternative storage options and with small data storage needs of the order of a few gigabytes.Users with larger storage demands need to apply for data storage at Swestore directly.So far, most ECDS users store their data at their university's or research organization's own data repositories and just provide metadata about their data resources to ECDS (Figure 1).An enhanced responsibility of ECDS as a Swedish repository for data, in addition to metadata, is subject to stakeholder decisions on the ECDS mandate and resources beyond 2013.
The requirements on the ECDS system and services have been identified with the help of a Swedish reference user group, consisting of experts from several Swedish universities and different environment and climate research fields.Additional information on ECDS can be found on the ECDS website (ECDS, 2012).

Support services offered to users
At present ECDS offers three types of services to users (Figure 2).Firstly, users can contact the ECDS helpdesk and get scientific and technical advice on issues, such as the documentation, organization, or storage of data or the writing of a data publication plan.In addition to personal support, ECDS users can also access complementary material and answers to frequently asked questions on the ECDS website.
The second service comprises data discovery with the help of the ECDS data portal.The portal enables users to search for environment and climate data using selection criteria, such as free text, thematic keywords (GCMD Earth science keywords), or geospatial and temporal constraints.The third service is the publication of standardized metadata descriptions of environment and climate datasets.All new entries of data resources are reviewed by an ECDS metadata expert to ensure consistency of the dataset documentation with the ECDS metadata standard and to help the data owner to make their data discoverable, well-documented, and usable by other scientists.If the metadata reviewer decides that corrections or completions are needed, the dataset description is returned to the provider.Depending on the revisions needed the metadata will be re-reviewed.The ECDS data portal allows users to discover and access environment and climate data using selection criteria, such as free text, thematic keywords (GCMD Earth science keywords), or geospatial and temporal constraints.Data publication (grey arrows): Users who want to publish data resource descriptions at ECDS need to get an account.Before a data resource description is accepted for inclusion in the ECDS metadata catalogue (PostgreSQL database), it is reviewed by an ECDS metadata expert to ensure consistency of the dataset documentation with the ECDS metadata standard.After the data resource description has been approved it can be discovered by other users.

ECDS access requirements and data policy
ECDS builds upon the vision of full, open, and trouble-free access to environment and climate data.As a consequence, the ECDS system is openly accessible by the national and international research community.Using standard web browsers, any user can search for environment and climate data.Registry of metadata for a data resource is open to any user who applies for an account and provides metadata entries in line with the ECDS metadata standard and quality requirements.Users will generally be expected to store the data at their university's data storage facilities, but some data may also be stored at the ECDS repository at Swestore.All datasets hosted by the ECDS repository at Swestore are explicitly required to be openly accessible, e.g., under a Creative Commons Attribution license (CC_BY, 2012).The latter allows for the reuse of data but ensures that the provider gets scientific credit by citation.The ECDS metadata profile provides a "dataset citation" section, where a metadata provider can insert information on the article(s) relevant to the creation of the respective dataset.This metadata element can also be used to state how the dataset should be cited when used in other research.Besides ensuring credit for the data provider, this metadata element also supports the traceability of data use in research.If the collection or creation of a dataset is described in a peer-reviewed scientific publication, the metadata element can also represent a first quality indicator of a dataset.The ECDS metadata profile also contains a specific section where the data provider can give a thorough description of the data quality.
Since 2012, open data access is also an explicit requirement for projects funded by the SRC and the Swedish Research Council for Sustainable Development (Formas).ECDS strongly recommends open access also for datasets that have their metadata hosted at ECDS but lack explicit open access requirements and do not use the ECDS data repository at Swestore.Exceptions from open access will only be tolerated in selected cases, e.g., for security, privacy, or ethical reasons.ECDS' present recommendation of open access may be strengthened to a requirement in the future, should this prove necessary to achieve the initiative's main goal of facilitating the reuse of scientific data.Commercial data resources are not accommodated by ECDS.Data from government agencies fall under the INSPIRE Directive (INSPIRE, 2012) and should therefore be made available on behalf of the individual agencies.

Collaborations
ECDS collaborates with many national and several international organizations and initiatives.National collaborations include Swedish universities, research centres, funding agencies, and governmental agencies.Collaboration topics range from concrete data management questions to advocacy for the full and open sharing of environment and climate data.Selected examples of national collaborations include joint activities with Swedish National Data Service (SND), which is ECDS' sister organisation within the humanities, social, and health sciences (SND, 2012), and with Swedish LifeWatch, a national research infrastructure for biodiversity data (SLW, 2012).ECDS and SND collaborate on technical issues, such as data storage, management, and dataset identification as well as on open access to research data in general.ECDS and Swedish LifeWatch collaborate on the integration of data discovery facilities.
Internationally, ECDS has taken over the responsibility for the discovery of Swedish research data collected during the last IPY.IPY was supported jointly by the International Council for Science (ICSU) and the World Meteorological Organization (WMO) and closed in 2010.After the end of the funding period for the Swedish IPY portal, the metadata for Swedish IPY datasets were migrated from the IPY metadata profile to the ECDS metadata profile, which makes the Swedish IPY data resources discoverable through the ECDS portal.ECDS also supports international data coordination initiatives, such as GEO/GEOSS, and has chosen technical standards that will allow for data exchange within the European INSPIRE community.

First achievements and challenges
The ECDS data portal has been operational since June 2011 and started with precisely zero data resource descriptions.However a decision had already been taken that ECDS should take over stewardship of data resource descriptions related to Swedish research activities during the last IPY and provide for continued discovery of data resources from these projects.With the inclusion of these resources, the content increased to a total of 60 datasets.After one and a half years of operation, the number of data resources discoverable through ECDS had increased to 115 by December 2012, provided by 52 publishers.The data resources are highly inhomogeneous and range from individual datasets of varying size to major compilations of data, fully-fledged relational databases, and other portals.An assessment of the total number of datasets shared through ECDS will thus depend on the level of granularity chosen for the definition of a dataset.This article only refers to data resources or entries registered in the ECDS data portal.Figure 3 provides an overview of the resources currently being shared through ECDS classified by scientific category according to the GCMD science keywords thesaurus.It is noteworthy that about half of the data resources are related to Swedish IPY research activities.So far, approximately two thirds of the data resources registered at ECDS fall into the domains atmosphere and ocean.
The current rate of data resource registrations at ECDS can be compared to an estimate of the production rate of new environment and climate data resources in Sweden.According to the latest statistics from the Swedish National Agency for Higher Education (HSV, 2012), the annual number of PhD degrees in the domain of geosciences at Swedish universities is of the order of 600.With the ad-hoc assumption that at least half of these produce at least one high-quality environment and climate data resource worth sharing with the wider scientific community, a lower benchmark of 300 new data resources is produced annually by this group.In addition, environment and climate data are produced in other scientific domains, such as technical, agricultural, or social sciences as well as outside the group of PhD degrees.Thus, there is a large discrepancy between the current inflow of data registrations into the ECDS portal (115 over one and a half years) and the estimated production rate of new data resources in Sweden (several hundred per year).The latter clearly illustrates the challenges but also the potential for scientific data sharing, consistent with international (Nelson, 2009) and national findings (Eklundh, 2008).Improvements can be expected as a result of the new requirements for open data access by the major Swedish research funding agencies SRC and Formas.However, as the requirements entered into force in 2012, it may take several years until projects funded under these new requirements will have created new data and can contribute data resource descriptions to the ECDS data portal.Other incentives are needed for data without explicit open access requirements.According to the experience of ECDS so far, implicit data sharing incentives, such as higher citation rates for data sharers, opportunities for new research, and overall long-term benefits for science and society (e.g., OECD, 2007;Uhlir et al., 2009) still provide insufficient motivation for the individual scientist to prioritize data sharing in the scientific work.ECDS' discussions with individual scientists known to host interesting environment and climate data resources suggest that the unpaid burden of data quality control, documentation, organization, and dissemination services represents a concrete hindrance to data sharing.A potential solution to this problem could be the establishment of improved funding options for data curation, including funding of comparably small efforts of the order of a person month, to enable individual scientists to revisit, quality control, document, organise, and share existing research data that are currently not accessible and risk being lost in the future.The introduction of more explicit reward mechanisms for data sharing is also proposed.One option could be to give extra credit to research funding applicants with a strong data sharing CV.

Short-term perspective
Until the end of its first funding period (2013), ECDS will implement several additional services for a more sophisticated discovery, distribution, and visualization of environment and climate data hosted at Swestore.One element is the implementation of THREDDS (Thematic Real-time Environmental Distributed Data Services) middleware to simplify the discovery and use of scientific data (THREDDS, 2012) hosted at Swestore.THREDDS also supports Open Geospatial Consortium services (OGC, 2012), such as Web Map Service (WMS) and Web Coverage Service (WCS), which is expected to facilitate the use of ECDS data resources in, e.g., the GIS community.An additional service will be the use of the open source tool suite iRODS (integrated Rule Oriented Data System) (IRODS, 2012).IRODS will be implemented at Swestore in 2013 enabling improved data organization, management, and administration of ECDS data hosted at the Swestore facility (Figure 1).
During 2013, ECDS will also work on the integration of its portal with other initiatives.This activity will include two parts:  harvesting of metadata from other initiatives to enable the discovery of the content of these initiatives through ECDS, also including government agency data. metadata delivery from ECDS to other portals to enable the discovery of ECDS-hosted resources through other portals.The first part will be demonstrated in collaboration with Swedish partners, such as Swedish LifeWatch and the Swedish Environmental Protection Agency.Regarding the second part, ECDS will commit to integrating its portal with the portal developed by GEO.Subject to the agreement of the respective data resource owner, ECDS data resource registrations will be marked by metadata tags enabling automatic metadata exchange with the GEO portal and for the inclusion in the GEOSS Data Collection of Open Resources for Everyone (DataCORE).
ECDS also plans to use the DOI-system.However, this requires the tracking and persistent storage of unchanged datasets and will thus not be applicable to externally hosted data where these criteria cannot be guaranteed.

Outlook
During 2013 ECDS will submit a proposal to the initiative's major sponsor SRC for the continuation and further development of its infrastructure for the period 2014 to 2018.The outlook on the further work of ECDS provided in this section is thus indicative and subject to the decisions of the initiative's stakeholders during 2013.Experiences from the first ECDS funding period and findings from a recent external evaluation of the national research infrastructures initiated by SRC (SRC, 2012) provide important guidelines for the work of ECDS beyond 2013.The focus of the evaluation has been on issues of organisation, management and accessibility of ECDS and ten other Swedish national research infrastructures.Clearly, there is a need to enhance support, outreach, and advocacy for data sharing in order to increase the rate of data resource registrations in the ECDS portal.Specifically, ECDS will need to initiate discussions on open data access requirements with financing bodies for environmental and climate research other than SRC and Formas.This activity will need to be complemented by a coordinated approach to identify environment and climate data treasures that already exist but are difficult to find and use for the scientific community, in particular data on the verge of being lost and data that have not yet been digitized or converted to widely usable data formats.
Regarding ECDS services, the use of established international standards in the implementation of ECDS provides an opportunity to further develop ECDS' connections to both national and international initiatives, in particular other portals.It is also likely that ECDS services established so far will be gradually improved based on evolving requirements of its user group and new technical developments.According to SRC (2012), ECDS needs stronger international links and a clarification of its mandate as a data repository.In the forthcoming application to SRC, ECDS will thus propose to establish an internationally integrated Swedish data repository for environment and climate data.The data repository will primarily target data outside the scope of other Swedish national research infrastructures.

CONCLUSIONS
ECDS is a new Swedish infrastructure responding to increasing needs for improved discovery, documentation, integration, management, and sharing of environment and climate data to support research.Initially, ECDS focused on the development of a metadata profile for environment and climate data compliant with international standards, the implementation of a system for the management of data resource descriptions, and tools for the search and registration of environment and climate data resources.The ECDS system has been operational since June 2011 together with a helpdesk providing scientific and technical support to researchers.The system will be gradually improved through additional services for data integration and visualization as well as through integration with other data portals and services.
So far, the inflow of environment and climate data resource registrations into the ECDS portal is only moderate.However, ECDS is a long-term initiative for data sharing.In both international and national research communities there is a growing understanding of the importance of scientific data sharing as a prerequisite for new scientific findings, knowledge, and services, underpinning the response to the rapid environmental and socioeconomic challenges that planet Earth and mankind are facing.Achieving a paradigm shift in scientific data sharing will require actions, such as:  the provision of technical solutions, services, and support for data management, discovery, access, use, and sharing, both at a national level and locally at universities,  better and earlier education of scientists during their university career in the above topics,  improved scientific reward mechanisms (e.g., more options for scientific publication of data articles, increased credit for scientists' data sharing records),  improved funding options for data curation, including funding of comparably small efforts of the order of a person month to allow individual scientists to quality control, document, organise, and share research data,  more research funding agencies to introduce explicit requirements for open data access.We believe that a concerted and widespread application of these actions is needed in order to truly establish data sharing as a natural element of the scientific process.

Figure 1 .
Figure 1.Structure diagram of the ECDS data portal and its connections to Swestore and external data resources

Figure 2 .
Figure 2. ECDS infrastructureoverview of three main use cases: User support (blue arrows): Users can contact the ECDS helpdesk and get personal scientific and technical advice and access support material on the ECDS website.Data search (green arrows):The ECDS data portal allows users to discover and access environment and climate data using selection criteria, such as free text, thematic keywords (GCMD Earth science keywords), or geospatial and temporal constraints.Data publication (grey arrows): Users who want to publish data resource descriptions at ECDS need to get an account.Before a data resource description is accepted for inclusion in the ECDS metadata catalogue (PostgreSQL database), it is reviewed by an ECDS metadata expert to ensure consistency of the dataset documentation with the ECDS metadata standard.After the data resource description has been approved it can be discovered by other users.

Figure 3 .
Figure 3. Data resources (datasets, collections of data, databases, other portals) accessible through the ECDS data portal after the first year of operation.Blue bars: Number of records by theme (classification according to top-level categories used in the GCMD science keywords thesaurus).Red bars: number of records collected within Swedish research activities during IPY 2007/2008.Note: the total number of classified records (139) differs slightly from the total number of registered records (115) due to several records having relevance to multiple categories.