1 Introduction

1.1 Background

The logic of empirical research in the natural sciences, social sciences, linguistics and economics encompasses a context of discovery, justification and exploitation. In the context of discovery, it is explained what is intended to be researched. In the context of justification, the empirical research design (methods used or developed for that specific research) is conceptionalised and applied yielding data as results. A general model of empirical data life cycle is given in Figure 1. And in the context of exploitation it is presented what is to happen with the results in terms of further research (sharing and reuse of data) or application (). Quality Assurance and Quality Control (QA/QC) is the subject of the context of justification and has implications for the exploitation of scientific findings (; ; ). The QA/QC criteria of operator independence (=objectivity) and reproducibility are closely related to the QA/QC criterion of the validity of the measurement data production. They must be the subject of the documentation of scientific data. These are subject to supplementary quality criteria (; ), which must also be documented and archived as further central RDM components and serve the long-term reusability of scientific data. Data documentation and archiving to ensure the long-term usability of data are not yet sufficiently taken into account in research practice. They therefore require special attention as well as sustainable and intensive activities (; ; ; ). As a typically cited obstacle to RDM that encompasses both the natural and social disciplines, it is often stressed that the differences between empirical and hermeneutical research methods cannot be bridged ().

Figure 1 

Typical Data Life Cycle in Research Processes of Landscape Ecology.

1.2 Objective

This contribution aims on the one hand at structurally and functionally integrating RDM into the research logic of empirical knowledge acquisition and thereby relativizing the separation between empirical scientific disciplines on the one hand and hermeneutically oriented subjects on the other. By extending Bühler’s communication model (), it is shown that linguistic signs in everyday life and scientific language are structure- and function-equivalent to quantitative information (data) (). On the other hand, the study concept and first methodical steps of the project Research of the Management of Research Data in their Life Cycle at Universities and Non-University Research Institutions/Subproject Environmental Sciences are presented. The focus here is on outlining the RDM in the environmental sciences using the example of the Chair of Landscape Ecology at the University of Vechta.

1.3 Research logic of empirical sciences

The sciences create knowledge by providing answers to previously unanswered questions in a comprehensible manner, publishing them in the form of peer reviewed quality-checked specialist journals or books and thus making them accessible for a long time. The starting point for gaining scientific knowledge is therefore the difference between the need for information (target) and the availability of information (actual): The existing knowledge, the theory of a scientific subject, has gaps that are to be reduced or even eliminated by new research. Target/actual discrepancies are regarded as problems to be solved (). They are the driving force of data and, respectively, of knowledge production. Empirical sciences fill the gaps in their theories by testing scientific assumptions, hypotheses, against quantitatively captured reality. This is done by subdividing complex phenomena such as ecosystem integrity or health which cannot be measured directly into directly measurable sub-phenomena. Thus, several less complex indicators represent complex indicanda like, e.g., ecosystem integrity and health (, ; ). Such indicators can be e.g. the concentration of heavy metals and persistent organic pollutants in soils and biomass of ecological systems or the concentration of immunoglobulins in blood serum and bone marrow of humans, which are quantitatively determined by special measuring methods. Theoretical constructs such as ecosystem integrity and health can thus be defined operationally. This means that, in addition to the assignment of a linguistic sign to a meaning, the procedures used to quantify its meaning are also indicated. The results of these measurements are called data and are used for statistical testing of the hypotheses. If the hypothesis test is positive, i.e. the provisional explanation is confirmed (verified), this new knowledge complements the theoretical knowledge of the respective scientific discipline. If this does not succeed (falsification), the examination is repeated – possibly by use of a modified experimental design encompassing other or additional indicators. Just as in everyday life, the acquisition of scientific knowledge requires that its answers are correct. In empirical sciences, the correctness of the knowledge gained is checked against reality. In contrast to hermeneutic sciences, plausibility as a quality characteristic alone is not sufficient. In addition to everyday knowledge acquisition, scientific knowledge acquisition must fulfil other quality criteria () (Section 1.4).

1.4 Data collection, hypothesis testing and data interpretation

Space, time and causality are elementary categories of scientific knowledge acquisition (). The focus of landscape ecology in the sense of the founder of landscape ecology Carl Troll () and his scholar Otto Fränzle () is the quantitative recording and explanation of spatial structures and functions of ecosystem complexes (landscapes) including their interrelationships (). Such findings are obtained by operationalized, i.e. methodologically supported, definitions of the objects of investigation on which the hypotheses are based. Hypotheses are verifiable statements that explain a measurable phenomenon. In the empirical sciences, the hypothesis of the question to be clarified in an investigation is formulated in such a way that a decision can be made within the framework of observation or measurement as to whether the assumption for answering the question is compatible with the empirical results. In this way, the knowledge about structures and functional contexts of the objects of knowledge is expanded. Based on the recording and description of structures and functions, it is a core component of such explanations to link causes/boundary conditions (if C …) with the resulting effects (then E …) and to check this conditional link to see whether they should be rejected (falsified) or can be confirmed (validated) until further notice (). Only validated hypotheses which are compatible with the state of knowledge and, in this sense, provisionally ‘true’ (i.e. not falsified) hypotheses are included in the knowledge pool, the theoretical stock of the respective scientific discipline () and can be used for explanations (How does a system function?), prognoses (How will a system develop under changed boundary conditions?) and technologies (How are the boundary conditions to be changed so that a system reaches a desired target state?).

In empirical sciences such as landscape ecology, knowledge is gained by transforming structures, i.e. elements and their relations in the form of material, energy and information flows as well as spatial and temporal relationships between the elements on the one hand and functions of the system to be researched on the other hand into cognitive relations (models) by observation or measurement methods. Their validity must then be proven, for example, by statistical hypothesis tests, which usually requires quantification of the observations. When transforming real structural and functional relations into numerical relations, quality criteria such as objectivity, reliability and validity as well as functional, spatial and temporal representativeness must be realized and documented in a comprehensible and long-term manner. The transparent documentation of the research logic, its quality criteria and data is a prerequisite for the evaluation of the validity of scientific findings and the subsequent verification and re-use of the results.

Scientific knowledge is gained in scientific discourses, which are structured as follows in the extension of the language model of Bühler (): A transmitter (scientist) uses linguistic signs, observation and measurement data as well as statistical parameters derived from them to inform receivers about structurally, functionally, temporally and spatially defined sections of reality such as landscapes. The observation/measurement object is named as the carrier of the characteristics to be investigated as well as the quantitative methods to be used for it. After their application, numbers (observation/measurement data) about the characteristics of the objects examined are available and can be presented by the transmitter to inform the receiver (scientists, society) about the object to be examined. The data can be used to calculate statistical measures and models and to test hypotheses. The results of the calculations are finally presented and interpreted by scientists with the help of linguistic signs in professional journals or books and remain accessible to the sciences and society for centuries to come thanks to the excellent library archives. Digital data archives that enable the re-use of collected information have so far been rare and at best play a subordinate role in university teaching and research ().

Measurement means to represent an empirical relative by a numerical relative, i.e. a quantitative model, true to structure. Relative means a set of objects or numbers and their relations. This relative is true to structure if the numerical relations reflect the empirical ones. For this purpose, the properties of the empirical relative must be taken into account. This is supposed to distinguish between nominal, ordinal, interval and ratio scales (). The information content of the data and the choice of statistical methods are linked to the scale model: The higher the scale level, the greater the information content and number of applicable statistical methods (). The scale level must also be taken into account when modelling ecological structures and functions.

Since neither ecosystems as elements of landscapes nor landscapes as spatial mosaics of ecosystems can be fully investigated, a hypothetical ecosystem or landscape model must be constructed before measuring their structures and functions. Models are simplified images of reality (). Statistical models, such as regression models, are used for the quantitative identification of ecosystem conditions. Dynamic models (simulation models) are used to map changes in ecosystem conditions, i.e. ecosystem dynamics, on the basis of relevant ecosystem functions in a time-differentiated manner. If measurement data were collected spatially differentiated, results calculated with statistical or dynamic models can be represented cartographically on their basis. Maps are spatial models. In modelling, simplification or abstraction from reality takes place according to the interest in knowledge or the operationalisability of the structures and functions of interest, with the aim of quantifying them and validating the model. This is done on the basis of the measured data collected, which represent reality as captured by the methodology applied.

Ecological modelling is often carried out using geographic information systems (GIS). Data collections to quantify the sections of reality depicted in the models usually do not cover all elements of the investigated system (statistical population) in order to test the research hypothesis. Rather, they are limited to a subset of a population, a sample. In order for the results based on the examination of the sample to apply to the population as well (induction), the selection procedure must ensure that the sample is representative of the population with respect to the system elements and properties covered by the model, including their spatial differentiation. Many procedures for selecting a representative sample can be distinguished (; , ; , ; ). In a random selection (including probability selection, random sample, random selection, random sample), each element of the population has the same equal probability > 0 for being integrated into the sample. This requires a complete list of all elements of the population. Strictly speaking, the methods of inductive statistics are only valid for random selections. Temporal and spatial auto-correlations are frequent in environmental data and can have the effect that the independence of the observations is not given and, thus, the procedures of inference statistics (hypothesis test statistics) are not appropriate without including auto-correlation. In the case of systematic sampling, the sampling elements are selected on the basis of inventories, from which a certain number of elements of a population are drawn according to rules to be defined. Preliminary information on the cases to be selected is used here. Random sampling is frequently used ().

Spatial structures, such as those expressed in the form of landscapes, are on the one hand the result of biological, chemical and physical processes (functions, processes). On the other hand, they can influence biological, chemical and physical processes. The spatial distribution of atmospheric depositions, for example, is the result of the interaction of the characteristics and location of emission sources, chemical properties of the emitted substances, meteorological conditions, vegetation cover and relief. At the same time, the spatial differentiation of atmospheric depositions can, for example, influence the chemical and biological functions of plant populations and, thus, the species composition, i.e. the structure of ecosystems, in a differentiated manner.

Spatial structures can take different forms (without recognizable pattern: random; linear: gradient; clotted). In addition, spatial dependence and spatial autocorrelation are distinguished, although the definitional boundaries sometimes appear difficult to operationalize (; , ; ; ; ; ). Nevertheless, landscape ecology aims to identify and explain spatial patterns. On the one hand, this means that spatial structures must be taken into account in data collection (sampling design), method selection and evaluation (statistical design, choice of model) (; ; ). On the other hand, it has to be examined whether hypotheses about correlations between spatially structured phenomena can be erroneously confirmed or rejected due to spatial auto-correlation.

The application of inferential statistical methods presupposes the independence of the data collected. This is not given if the data is auto-correlated. Auto-correlation can be found where measurements are taken in time series (temporal autocorrelation), distributed over large spaces or along environmental gradients (spatial autocorrelation). Ecosystem processes of larger spatial extent often show positive spatial auto-correlation at short distances, i.e. the investigated characteristic values are positively correlated. Along a gradient, this positive auto-correlation for short distances is coupled with a negative autocorrelation over long distances (). With positive spatial auto-correlation, a measured value of a certain location can be predicted to a certain extent by the measured values in the surroundings (). However, this partly predictable value is not stochastically independent, so that each measured value does not provide another whole degree of freedom for statistical modelling (pseudoreplication) (). Some models therefore take into account the spatial auto-correlation of dependent and independent variables, which can be important for the validity of the models and the underlying hypotheses.

In the Section 2 it will be shown to what extent the design of the research logic of empirical knowledge gains presented so far has already been realized in the research projects of the Chair of Landscape Ecology at the University of Vechta in its RDM. The study is part of a BMBF-funded project that began in October 2017 under the BMBF’s Research into the Management of Research Data in their Life Cycle at Universities and Non-University Research Institutions programme entitled Bottom-up Management Model for Establishing Institutional Research Data Management (RDM) in the Natural and Social Sciences (UniV-RDM).

2 A RDM project at the Chair of Landscape Ecology at the University of Vechta

2.1 Objectives and structure

As illustrated in Section 1 using the example of the research logic of empirical research in general and with a focus on landscape ecology, sciences produce information in the form of quantitative data (measured values), texts or optical and acoustic recordings. How and according to which criteria can such data be documented and archived and made available for exchange with other scientists? Since autumn 2017, the UniV-RDM project, which aims to establish a university research data management system (RDM), has been dealing with these questions since autumn 2017. The disciplines Social Work and Landscape Ecology are involved in the exemplary investigation of the RDM in the social and natural sciences and in the expansion of the findings to neighbouring disciplines. As a further project partner, the University Library is responsible for dealing with RDM-related legal and administrative issues and for setting up RDM infrastructures in cooperation with the computer centre.

Subproject 1 aims, among other things, at researching the management of research data in their life cycle at the University of Vechta. According to the project results achieved so far, the following explanations refer to the RDM inventory at the Chair of Landscape Ecology, the description of a typical data life cycle in and the presentation of RDM concepts for landscape ecology and beyond.

2.2 Materials and methods

The status quo recording of the activities on RDM at the Chair of Landscape Ecology is based on the expert knowledge of members of the Chair, publications of the Chair related to RDM, technical documentation of the RDM infrastructure used at the Chair (subject databases and research data repositories/archives), process archives on research projects as well as expert opinions on qualification work.

The central element of the survey was the exploration of the RDM expert knowledge available at the Chair of Landscape Ecology by means of a semi-structured interview procedure (oral survey, standardised questionnaire) on the following topics:

  • Preparing and planning the research process (actors, data management plans, …),
  • data collection and processing (data sources, types, types, formats, processing software, …),
  • data description, documentation and metadata (guidelines and standards, identifiers, documentation software, …),
  • data storage, backup and archiving (responsibilities, data carriers, storage volume, archiving software, …),
  • data availability and data provision (demand, type of provision, …),
  • law and governance (guidelines, obligations, data protection, copyright, …), and
  • infrastructure and service (technical support, information, …).

Following the expert interviews, a typical data life circle for landscape ecology was specified in accordance with the research logic described in Section 1 and specified using the above-mentioned materials. The RDM concepts typical for the Chair of Landscape Ecology were identified and documented on the basis of the interview results described below.

2.3 Results of the expert interviews

Data management planning of the Chair of Landscape Ecology is usually carried out in cooperation with research donors, colleagues, data producers, the computer centre and/or the Department of Research Development and Knowledge Transfer of the University of Vechta. In its research projects, the Chair of Landscape Ecology mainly produces and uses quantitative data (mainly secondary data, partly primary data). These mainly originate from field surveys, laboratory measurements and model calculations. Characteristic for landscape ecology are spatial data (raster and vector data) but also other factual and image data. 99% of these are archived in digital form at the Chair of Landscape Ecology. About 70% of the research data (RD) is documented using systematically named directory and file structures as well as semantic and structural metadata for third parties. For the spatial data the ISO standard 19115 – ‘Geographic Information – Metadata’ () and the quality model of the German umbrella organisation for geo-information () are considered as metadata standards for geo-data (). The RD is usually stored and backed up on data servers of the chair or the library as well as on external storage media, and since 2017 also on external publicly accessible research data repositories. The data volume of the RD is currently between 1 and 10 TB – with an upward trend. Data backups are performed daily on the above data servers, weekly to monthly for the external storage media. The demand for the provision of RDs due to requests from third parties is currently relatively low (approx. 2 × per year). Deployment takes place via web sites, cloud applications or e-mail. The availability of RD is promoted by the Chair of Landscape Ecology in compliance with applicable usage restrictions. Requirements of the Chair of Landscape Ecology for suitable scientific repositories include in particular the long-term, secure storage for at least 10 years, reasonable costs for archiving, a quality check of the data, a permanent addressing/quotability of the data, the possibility to describe and index the data, the visibility of the data as well as a user-friendly access. As services of the University for the RDM, the Chair of Landscape Ecology would be interested in supporting the publication and citation of research data and legal questions (e.g. access restrictions, handling of sensitive data, use of licenses), in supporting the creation of a data management plan, in ensuring that research data can be cited unambiguously, and in providing technical solutions for collaborative work. Information services preferred by the Chair of Landscape Ecology would be personal consultations, training courses, tutorials and a website.

The typical Data Life Cycle outlined in Figure 1 (chapter 1.1) is based on the research logic process identified in Section 1. Three methods developed and/or applied at the Chair of Landscape Ecology were identified, which will be presented in the following as concepts for the support of processes in the RDM of Landscape Ecology; in other words, different practices of people working within the Chair of Landscape Ecology can be summed up as follows.

2.3.1 Subject databases

In order to implement the UNECE Convention on the Long-range Transboundary Transport of Air Pollution (CLRTAP), the European Moss Survey (EMS) collects moss samples every 5 years from up to 7,300 sites in Europe, analyses them chemically and evaluates the results statistically. Mosses are used as bioaccumulation indicators for ecosystem pressures due to their ability to accumulate atmospheric depositions over time without uptake from soil (, ). Germany participated in the EMS in 1990, 1995, 2000, 2005 and currently in 2015. The Chair of Landscape Ecology coordinated the RDM, funded by the German Environment Agency in the years 2000, 2005 and 2015. The open source web application MossMet developed at the Chair of Landscape Ecology and, since 2015, the system MossMetEU, a derivative of MossMet adapted to the international requirements of the ICP Vegetation (), are used for the documentation and long-term archiving of measurement data as well as the quality of their collection and processing in the EMS. In so far, MossMet(EU) is a parallel to the repository of the Long Term Ecological Research (). At the same time, the Data Management System (DMS) of the Moss Survey Coordination Centre, Frank Laboratory of Neutron Physics, Joint Institute for Nuclear Research, Dubna, Russian Federation () will be used, in which the measurement data of the entire European moss monitoring network, but not the sample-taking and location-describing data, will be combined and stored.

The development of MossMet and MossMetEU at the Chair of Landscape Ecology is based on extensive investigations of environmental monitoring networks in Germany. These are further differentiated according to administrative responsibilities (federal states, federal government, EU) and according to environmental media and sometimes within individual measurement networks specified in this way. Essential information presented in Section 1, such as geographical coordinates of measurement and sampling locations, methods of data collection, documentation and archiving of measurement data (; ; ; , ,, , , ,; , ) were collected.

MossMet for the German contribution to the EMS consists of a metadata and a WebGIS component (,,,, , , ; ,,; ,,; ; ) based on an Apache web server with PostgreSQL as database management system. Mapbender serves as WebGIS client. MossMet enables data entry, modification, retrieval and download. User access to the database is regulated by rights management. In addition, WebGIS functionalities such as buffer functions around sampling points, queries for identifiers, sampling points or meta and measurement data have been implemented. The European variant MossMetEU supports the Europe-wide collection of site and sample describing ‘metadata’ such as moss species, shoot length, soil type, surrounding land use as well as the maintenance of measurement data on concentrations of heavy metals and nitrogen in the mosses (Figure 2).

Figure 2 

Web application ‘MossMetEU’ for the collection and management of data of the European Moss Survey 2015.

Both database applications ensure the integration and quality assurance of the extensive moss data sets, which are collected by different editors for different moss surveys and participating states. Upon request, the RD will be made available to third parties via password-protected access and an export function, whereby copyrights will be protected in individual cases through coordination with the participating states of the EMS.

2.3.2 Scientific research data repositories

Two publicly accessible research data repositories are currently used by the Chair of Landscape Ecology: PANGAEA – Data Publisher for Earth & Environmental Science® (www.pangaea.org), operated by the Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research (AWI) and the Centre for Marine Environmental Sciences (MARUM) at the University of Bremen and ZENODO® (www.zenodo.org), a data service funded by the European Commission. Both guarantee a long-term storage of the data (at least 10 years) and support the allocation of Digital Object Identifiers (DOI), whereby the provided information objects can be clearly referenced and quoted. They offer the possibility of describing and indexing digital resources through publicly visible bibliographic metadata. They have a user-friendly access to the resources and offer the possibility to select licenses for their use. While PANGAEA® includes a quality check of the data (important for publicly accessible data), ZENODO® accepts data publications in open access as well as in restricted access. ZENODO® also supports the archiving of software products that are important for ensuring the replicability of scientific calculations.

The publication strategies pursued by the Chair of Landscape Ecology are: Data publications as supplementary material to scientific text publications () as well as to special data papers (; ). With regard to the protection of existing copyrights and user agreements with the data providers, only limited access for replication purposes is granted for some of the research data (). In addition, methodological procedures, i.e. software products developed primarily in the research process (=scientific software), are made accessible as part of the above-mentioned data publications (above all scripts) or as independent software products of higher quality () (open source/open access). The archive files linked to the data publications of the Chair of Landscape Ecology integrate a largely standardized structure with the following elements: software documentation of the standard software used (e.g. product name, version, license type, web link, name of executable file), documentation of specially developed scientific software (e.g. source code with comments and user documentation, dynamic linked libraries), documentation of the input and output data of the scientific calculations as well as derived products (e.g. maps, diagrams). A current and complete documentation of data and software publications is integral part of the homepage of the Chair of Landscape Ecology at the University of Vechta.

The citable bibliographic metadata sheets of the research data repositories are to be regarded as data publications. Typically, however, bibliographic metadata lack a (landscape-ecological) specificity, so that they may need to be supplemented by elements of other metadata standards (). For the mostly spatial data of landscape ecology the ISO Standard 19115 – ‘Geographic Information – Metadata’ () and for the description of its quality features the quality model of the German Umbrella Association for Geoinformation e.V. are used () or EN ISO 19157 (). While the ISO 19115 standard is widely used, but difficult to implement, the DDGI has provided a more feasible metadata standard based on ISO 19115. In addition to the bibliographic metadata, mainly structural metadata (above all designation and short description of the classes and their characteristics) as well as quality-related metadata (thematic accuracy, position accuracy, completeness, consistency etc.) are integrated into the above-mentioned archive files.

2.3.3 Vechtaer Open Access Document Server (VOADo) and VSpace – Internal Document Server of the University of Vechta

The web-based platform VOADo of the University of Vechta serves to publish, make available and archive scientific documents and associated RD. All documents are provided with a digital object identifier (DOI) and can thus be found worldwide in the sense of Open Access. At the Chair of Landscape Ecology, VOADo has so far been used for two projects (; ). At the same time VSpace is offered by the University Library as a document server for university members to publish student works and teaching materials for teaching and research purposes. All documents are accessible for authorized users inside after logging in over the search function of VSpace. To use VSpace for RDM purposes, you can save text documents, presentations, etc., as well as RD and software tools in file archives, combine them in collections, and link them to other documents. For both VOADo and VSpace, guidelines for the publication of documents and data have been developed by the responsible experts of the University of Vechta for their students and scientists.

3 Summary and conclusions

In the UniV-RDM project, RDM activities are to be further developed, systematised and expanded across universities on the basis of an initiated university policy discourse on the RDM. The university’s RDM culture is to be strengthened by means of public relations measures aimed at motivating and sensitising all stakeholders to RDM-related legal and administrative issues. In the future, RDM activities should be coordinated centrally into an overall institutional structure with the involvement of external partners. On the basis of the status quo recording, standards for RD, data management plans (DMP) and taking into account local specific requirements and cost-benefit assessments, the appropriate infrastructures for the respective data stock are developed, provided and/or communicated. Standardised RDM concepts are to be used to establish target group-specific training courses to build RDM skills and to draft institutional RDM rules to safeguard the RDM at university level.

The advantages of database-based solutions (Section 2.3.1) over file-based solutions (Sections 2.3.2 and 2.3.3) are mainly due to better data quality assurance (criteria: completeness, consistency) using database concepts (e.g., constraints) and well-defined data structures. Above all, this facilitates data integration of spatially and temporally heterogeneous surveys (for example, six moss monitoring campaigns since 1990 in up to 36 participating States). The main disadvantage results from comparatively high development costs for specific database solutions. For example MossMet(EU) was only developed as a platform for data collection, data exchange and QA/QC specified for the demands of the Moss Survey of UNECE-ICP Vegetation. In contrast, the research data repositories provide the advantage of using cost-effective standard functionality for archiving and documenting a very wide variety of data collected for different purposes, but with the great disadvantage of mostly lower semantic and syntactic interoperability. In other words, while the specialist databases are geared to special requirements of landscape ecology in a narrowly defined domain (here: moss monitoring), the generally accessible repositories used by the Chair of Landscape Ecology (PANGAEA®, ZENODO®) fulfil general core requirements of an RDM in landscape ecology: reusability, replicability, citability and visibility. The same applies to the institutional repositories (VOADo, VSpace), which are only available free of charge to university members. However, the effort for the preparation of data publications, especially for the publication-related data archives selected here, is high and should be quantified further as a basis for the creation of necessary incentive systems.