Introduction

In the 21st century, digital data drive innovation and decision-making in nearly every field (, , , , , , , , , ). Key questions today center not on whether data can add value in these areas, but rather on how to obtain access to more data, how best to leverage data given concerns about privacy and security, and how much value can be gained by reusing data (, , , and , , , , ). These questions are being raised in both the public and private sectors, where stakeholders increasingly see data as an asset that can be leveraged to spur creativity, innovation and economic growth, as well as to increase trust (e.g., in government or the results of scientific research) (, BDVA 2014, , , , , , , , , , , , ).

Desires for greater transparency and access to data have been particularly high for data 1) that are produced at public expense, whether as part of publicly-funded research or other public initiatives, and 2) that are used in or produced as part of sponsored scholarly inquiry (“sponsored research data” or “research data”), whether publicly or privately funded. Demand for access to the former is driven by public interest and opportunity for public benefit (, , , , , , , , ). Demand for access to the latter is driven by public interest and principles of scholarship, especially those that advocate for open availability of knowledge to support further inquiry (, , , , , , , ). Demand is also driven by a desire for increased reproducibility and accountability. Several high profile cases of a lack of supporting data or suspected or actual fraud have brought greater scrutiny to the availability of data to replicate and verify research results (see for example , , ).

The demand for greater access to sponsored research data has focused attention on the chain of activities that lead to data access, including what data are saved by those who create them, where and how those data are stored and preserved, how they are described, what support for their reuse is available, and how they can be discovered and accessed. Taken together, these activities to maintain the integrity of and preserve access to data are commonly known as data stewardship. The National Academies, in a 2009 study (), defines stewardship as:

“…the long-term preservation of data so as to ensure their continued value, sometimes for unanticipated uses. Stewardship goes beyond simply making data accessible. It implies preserving data and metadata so that they can be used by researchers in the same field and in fields other than that of the data’s creators. It implies the active curation and preservation of data over extended periods, which generally requires moving data from one storage platform to another. The term “stewardship” embodies a conception of research in which data are both an end product of research and a vital component of the research infrastructure.”

The importance of data stewardship to leveraging sponsored research data for a variety of purposes, both now and in the future, is reflected in the initiatives and policies that have been created in recent years in countries around the world, particularly in the United States and Europe, but in other places as well, to increase access to sponsored research data (, , , , OECD 2007, OMB 2002, OMB 2013, , , ).

While there is general agreement about the actions that must be taken and roles that must be played to steward research data, there is a lack of clarity about who should have responsibility for fundamental aspects of stewardship, and different understandings of what constitutes effective stewardship. This contributes to a fractured and diffuse environment for stewardship (, ; see also , , , , , , , , ). In fact, despite the large number of data repositories, stewardship initiatives, and policies across the research data landscape, we know relatively little about the total amount, characteristics, or sustainability of stewarded research data (, , , , Pienta 2006, , ).

What we do know gives us pause. For instance, in , Read et al. conducted a study of the number of datasets mentioned in journal articles resulting from National Institutes of Health (NIH)-funded research that were deposited in “well-known, publicly-accessible data repositories” (). They found mentions of such deposit in only 12% of published articles. Based on the number of datasets Read et al. identified in articles where no deposit of datasets was mentioned, they estimated that between 200,000 and 235,000 datasets resulting from NIH-funded research in 2011 were “invisible” (not found in one of the well-known repositories). The PARSE.Insight project found similar results in its wide-ranging study to “gain insight into the practices, needs and requirements of research communities” (). It found in a survey of more than 1,300 researchers across multiple disciplines that only 20% of respondents deposited data in a digital archive ().

This research raises important questions. Are stewardship arrangements sufficient? Do researchers, research sponsors, and research institutions adequately understand what they need to do? Are public policies appropriate? These are questions worth answering.

The starting point for answering these questions is a very substantial published literature about research data stewardship. In this article we explore what we know about research data stewardship through the lens of that literature, allowing us to characterize the important questions that previous researchers have asked. It also allows us to show areas that will require additional research in the future. We return to those unanswered questions at the end of this article, so that we can propose valuable lines of future research that need to be explored.

Data Collection and Analysis

Our literature review explores three different samples of literature, which we used to conduct the different analysis presented in the paper. For the purposes of developing our samples, we defined data stewardship according to the National Academy of Sciences definition provided above. Also as above, we defined “sponsored research data” or “research data” as data used in or produced as part of sponsored scholarly inquiry, whether publicly or privately funded.

The first sample (Sample A) is a body of 87 works, including literature reviews, reports, and empirical research that we analyzed to discover what scholars and practitioners identify as challenges to data stewardship. A list of these works can be found in York et al. ().

We conducted descriptive coding of this sample, from which we identified three levels of stewardship gap “areas” and “sub-areas.” At the highest level we defined six gap areas. We arrived at these by distilling 14 broader gap areas, which we in turn aggregated from 56 more granular gap sub-areas. The areas and sub-areas are described below. The different levels of gap areas and subareas, as well as the papers we identified them from are available to browse at York et al. ( and ).

The second sample (Sample B) comprises 74 works selected out of the 87 works from Sample A. These are listed at York et al. (). In addition to identifying challenges to data stewardship, the authors of these 74 works also identified relationships between the challenges (e.g., challenges that cause or exacerbate others). The data in Figures 1 and 2 refer to this sample.

Figure 1 

Statements about gap areas and the relationships between them.

Figure 2 

Gap areas and relationships between them. The figure is arranged to show that gap areas in each column impact the gap areas in the rows below them. For instance, Culture (in the first column) impacts Knowledge, Commitment, Legal and Policy Issues, etc. (the gap areas in the rows of that column).

The third sample (Sample C) is a set of 142 works, some of which are included in Sample A and Sample B, that explicitly seek to measure stewardship gap areas and sub-areas, or articulate metrics for measuring them. These are listed at York et al. (). Sample C excludes reports and other works that, for instance, articulate strategies or ideas for addressing stewardship challenges but do not conduct empirical research to measure at least one of our identified gap areas, or theoretical research to identify what might be measured.

We limited works in Sample C for the most part to those dealing explicitly with research data (as opposed, for example, to preservation of digitized cultural materials), though there were a few others. These include studies that investigated the total amount of digital information (e.g., and , , ), studies targeted toward digital curation skills broadly (but that include consideration for research data) (e.g., , ) and some studies that investigated public sector or government information (e.g., , ).

We conducted initial stages of coding using a combination of spreadsheets and the Web-based tool Workflowy. We subsequently kept track of article codes using spreadsheets and a Web-based database platform (Drupal) where data from the project are available (see http://www.stewardshipgap.net; for data in tabular form see ).

We identified the works in all three samples through a variety of methods, including searching for topics related to stewardship and curation in and across databases (e.g., using services such as Google Scholar and cross-database aggregation services such as Summon), and analyzing cited references in relevant articles, reports, and projects. The works have a geographic bias towards North America and Europe and are biased as well to those in English. We describe our analyses using these samples below.

Defining the Stewardship Gap

Identifying Gap Areas

While numerous studies and reports have defined data stewardship, identified stewardship needs, put forth strategies to improve stewardship, and undertaken measurement and analysis of key factors that contribute to data stewardship (described below), no community-wide metrics for or measurements of the stewardship gap as a whole exist (one method for identifying the existence of a stewardship gap is described in ).

Measuring the stewardship gap is complex not only because it is difficult to measure the amount of sponsored research data that exist, but because a simple quantified measure of data would not provide critical information about the stewardship environment, prospects for stewardship, or other indicators that could yield insight into the likelihood that data will be stewarded either in the short or long term. Measuring the stewardship gap involves taking stock of a wide variety of component issues or “gaps” and the ways these interrelate and affect one another.

We show the scale of the issue in Table 1, in which we identify 14 gap areas, drawn from 87 articles, reports, and other works related to data stewardship (Sample A).

Table 1

Stewardship gap areas, descriptions.

Gap AreaDescription

CultureGap arising from differences in attitudes, goals, practices, and priorities among disciplines and communities that have an impact on data stewardship and reuse
Legal/PolicyGap between current regulations and policies that govern data stewardship and reuse and those that would maximally facilitate stewardship and reuse
KnowledgeGap between what is known and what needs to be known to effectively plan for and ensure effective data stewardship
ResponsibilityGap between who currently has responsibility for stewardship and who is best placed to steward data over time
CommitmentGap between the stewardship commitments that exist on valuable data and the commitments necessary to ensure long-term preservation and access
Human ResourcesGap between the human effort and skills needed to steward and make data accessible, and the effort and skilled workers that are available
Infrastructure and ToolsGap between the infrastructure available to steward and reuse data and infrastructure needed to maximize stewardship and reuse capabilities
FundingGap between the funding needed for effective stewardship and the funding available
Curation, Management, and PreservationGap between the ways data is managed and prepared for preservation and reuse and ways that would maximize its potential for preservation and reuse
Sustainability PlanningGap between planning that is done to ensure adequate resources for stewardship and the planning that is needed
CollaborationGap between the collaboration needed for effective stewardship and the collaboration that takes place
Sharing and AccessGaps between the amount of data that are shared or made accessible and the amount of data that is not
DiscoveryGap between the amount of accessible data that is discoverable and the amount that is not
ReuseGap between the data that is available for reuse and the data that is used

While in this paper we discuss all 14, in some of our analysis we combined these into six categories as listed below:

  1. Culture (including Legal and Policy Issues)
  2. Knowledge
  3. Responsibility
  4. Commitment
  5. Resources (including Infrastructure and Tools, Human Resources and Funding)
  6. Stewardship Actions (including Curation, Management and Preservation, Sustainability Planning, Collaboration, Sharing and Access, Discovery, and Data Reuse)

Further information about each gap area is provided in the Appendix.

Identifying Relationships Between Gaps

Many of the articles and reports that we examined also indicate a relationship between gap areas—for instance, that deficiencies or gaps in policies for archiving data affect the quantity of data that are shared. Examples of statements indicating such relationships are shown in Figure 1. The arrow indicates the direction of the relationship. As the fourth and fifth statements indicate, the influences are not always unidirectional (e.g., Knowledge can affect Sustainability Planning and vice versa).

Figure 2 shows the relationships between the 14 gap areas as identified from nearly 300 relationship statements like the ones above within 74 of the 87 works we reviewed (Sample B). The figure is arranged to show that gap areas in each column impact the gap areas in the rows below them. For instance, Culture (in the first column) impacts Knowledge, Commitment, Legal and Policy Issues, etc. (the gap areas in the rows of that column). The relationships shown are direct relationships drawn from the statements, and not something that we have inferred. One might infer, for example, that legal and policy issues would have an impact on how much we know in certain areas, or who is responsible for which aspects of stewardship. Since these relationships are not explicitly indicated in the literature, however, they are not represented here. The figure, then, does not attempt to represent comprehensive or definitive relationships between the gap areas. It does, however, represent what has been written about in a fairly large sample of widely cited literature about research data stewardship.

Horizontal rows with significant amounts of red indicate areas where many factors are at play. For instance, there are many factors that affect funding for data stewardship, the seventh row from the top (e.g., Culture, Knowledge, Responsibility, Commitment, etc.). Rows with significant white space indicate areas that may be difficult to address because there are not a lot of identified factors that influence them. For instance, Responsibility is shaped by Collaboration and Commitment, but there are few factors that affect Collaboration and Commitment themselves (and two of the factors that affect Commitment are bi-directional relationships with Responsibility and Collaboration).

One finding from this analysis is that many gap areas that have the largest impact on other areas are affected by relatively few factors. This implies that changes in such areas, including Collaboration, Culture, Knowledge, Responsibility and Commitment, could benefit data stewardship, but may also be difficult to effect. On the positive side, our analysis shows that there are at least some factors that do influence these gaps (e.g., Collaboration is impacted by Infrastructure and Tools and Culture by Funding and Legal and Policy Issues) and these factors could potentially be leveraged in efforts to change the size and nature of some gaps.

A second significant finding is the scarcity of references to factors that have an impact on Discovery of data, or vice versa. Discovery is only mentioned in a couple of contexts in the literature, mainly in connection with infrastructure (e.g., that infrastructure is needed for discovery). Many sources talk about curation, management and preservation influencing reuse of data, but skip the step of how it is made known that data are available for reuse.

Our effort to define the stewardship gap leads us to believe that there are multiple gaps, that the gaps are not isolated from one other but rather relate to and impact each other in different ways, and that while a number of such relationships have been identified in the literature, the relationships between some may be better understood than others (e.g., more is known about factors that affect Infrastructure and Tools than Discovery). It follows from these beliefs that the development of effective strategies to address apparent stewardship gaps will depend on an analysis of the gap areas that are most relevant in a particular context, an understanding of which other gap areas could be targeted to help reduce or eliminate the observed gaps, and reliable means of measuring the extent of the gaps (in order to calibrate levels of investment). We turn our attention now to the last of these—means of measurement.

Stewardship Gap Measurements and Metrics

How do we measure the stewardship gap? The stewardship literature includes many studies that define ways to measure the gap (“metrics”), or that actually measure the gap itself (“measurements”). For the purposes of our investigation, we considered studies to be measurements if they gathered information relevant to a stewardship gap area (whether through case studies, interviews, surveys, ethnography, or another method), and to develop or articulate metrics if they stated criteria that could be used as a basis for measurement. The value of measurements is that they help us understand specific attributes of stewardship that can be measured, for example the amount of resources or the size of archives; the value of metrics is that they help us understand how to measure stewardship or define measurements.

Our initial review of the literature led us to identify the 14 areas identified in the previous section relevant to the stewardship gap. We discuss measurement of these areas across the literature below, following some examples of what we considered to be measurement and metrics studies.

Examples of Measurement and Metrics Studies

Fecher et al.’s () article “What Drives Academic Data Sharing” is an example of a study that includes both measurements and metrics. Fecher and colleagues describe a framework for understanding data sharing in academic settings, which we consider metrics. The framework comprises six categories of factors that contribute to data sharing. These are, as described in the paper:

  • Data donor, comprising factors regarding the individual researcher who is sharing data (e.g., invested resources, returns received for sharing)
  • Research organization, comprising factors concerning the crucial organizational entities for the donating researcher, being their own organization and funding agencies (e.g., funding policies)
  • Research community, comprising factors regarding the disciplinary data-sharing practices (e.g., formatting standards, sharing culture)
  • Norms, comprising factors concerning the legal and ethical codes for data sharing (e.g., copyright, confidentiality)
  • Data recipients, comprising factors regarding the third party reuse of shared research data (e.g., adverse use)
  • Data infrastructure, comprising factors concerning the technical infrastructure for data sharing (e.g., data management system, technical support)

In order to develop the framework, Fecher and colleagues conducted a systematic review of the literature and a survey of secondary data users, which we consider measurement. In their research, they explored questions such as why researchers do not share data, what returns or awards are received from data sharing, whether data sharing is encouraged by employers or funding agencies, what would motivate researchers to share data, and what value is gained from data sharing. They related the results from their survey to findings of other studies on data sharing in order to build the data sharing framework, which they believed had both theoretical and practical use. Their findings indicated that, in contrast to theoretical representations of open science or crowd science, “[r]esearch data is in large parts not a knowledge commons.” Their results pointed to “a perceived ownership of data (reflected in the right to publish first) and a need for control (reflected in the fear of data misuse). Both impede a commons-based exchange of research data.” This finding, they argued, had practical implications for policy:

“Considering that research data is far from being a commons, we believe that research policies should work towards an efficient exchange system in which as much data is shared as possible. Strategic policy measures could therefore go into two directions: First, they could provide incentives for sharing data and second impede researchers not to share.” ()

Overall, they argued that their framework helped “to gain a better understanding of the prevailing issues and [provide] insights into underlying dynamics of academic data sharing” ().

Fecher’s study is out of the norm in addressing both measurement and metrics, and we found only one other study, “A game theoretic analysis of research data sharing,” by Pronk et al. () that articulated metrics for data sharing. This study describes a game theoretic model in which there is a cost associated with sharing datasets and a benefit associated with reusing datasets. The model includes such parameters as the time-cost to prepare a dataset for sharing and for reuse, the benefits of gaining citations, the probability of finding an appropriate dataset to reuse, and the percentage of scientists sharing their research data. The authors ran simulations with varying parameter values and found that not sharing data is always the best option for researchers individually; however, both researchers who share data and those who do not are better off when more researchers share, and more researchers can thus gain the benefits associated with reusing data. Pronk et al. note that this is a classic example of the prisoner’s dilemma. They conclude from their experiments that introducing a “citation benefit” for papers that are accompanied by a shared dataset is a more effective means of incentivizing and increasing rates of sharing than, for instance, reducing the costs of data sharing or making sharing obligatory through the use of policies.

The majority of studies regarding data sharing concentrated on measurement alone, focusing on attitudes towards data sharing, whether and how data are shared, limits on data sharing (e.g., privacy, intellectual property, or security concerns), incentives for data sharing, and problems encountered when trying to share data.

Measurement and Metrics Across the Literature

Table 2 shows the number of studies, reports, and projects (hereafter referred to as “studies”) out of 142 investigated in Sample C that either measure or provide metrics for measuring aspects of the stewardship gap. There are 56 distinct gap sub-areas within the 14 gap areas described above and our final six areas. These gap areas and sub-areas are represented as Level 3, Level 2, and Level 1, respectively, in Table 2. We identified some type of study (related to either measurement or metrics) in 48 of the 56 areas.

Table 2

Three levels of coding of gap areas and sub-areas and how many studies for each we identified in the literature. We identified some works as both measurement and metrics studies and some fell into multiple gap areas and sub-areas. The rows with totals include the distinct number of studies (out of 142) in each Level 1 gap area.

Level of coding aggregationMeasurement StudiesMetrics Studies

Level 1Level 2Level 3

CultureCulture
Sharing attitudes and practices452
Standards80
Research and development culture60
Evaluation of quality510
Stewardship priority20
Demand for data10
Data definition10
Intellectual property10
Archive mandates and objectives00
Identifying what is valuable115
Legal and Policy
Lack of consistency and alignment113
Deficiencies that inhibit stewardship, access, and use100
Institutional structures and pressures61
Incentives that support stewardship, access, and use51
Culture Measurement and Metrics Study Total77

KnowledgeKnowledge
Amount of data275
Costs of stewardship1410
Infrastructure for stewardship20
Where to deposit data20
Challenges of enabling data reuse10
How to preserve10
Provenance and authenticity03
Reuse possibilities00
Knowledge Measurement and Metrics Study Total47

ResponsibilityResponsibility
Conduct stewardship activities98
Coordinate stewardship activities11
Support stewardship activities17
Responsibility Measurement and Metrics Study Total18

CommitmentCommitment
Lack of commitment11
Extent of commitment11
Duration of commitment01
Commitment Measurement and Metrics Study Total2

ResourcesHuman Resources
Lack of skills194
Lack of support for data management10
Lack of people50
Uneven distribution of skills20
Unequal access to resources and expertise00
Infrastructure and Tools
Lack of infrastructure192
Lack of tools161
Difficulty meeting generalized and special needs20
Different timescales of infrastructure development and maturity00
Funding
Lack of funding120
Imbalance in funding00
Resources Measurement and Metrics Study Total37

ActionsCuration, Management, and Preservation
Fragmented data management210
Insufficient data curation or management187
Difficulty managing data for reuse142
Difficulty establishing the trustworthiness of curated data12
Difficulty maintaining the integrity of data over time19
Tradeoffs between data management for short or long term00
Sustainability Planning
Business and economic models58
Dynamic and adaptable infrastructure40
Lack of strategy and planning40
Design and staffing of organizations32
Collaboration
Lack of collaboration30
Challenges forming partnerships20
Support structures00
Lack of critical mass00
Sharing and AccessSharing and Access367
DiscoveryDiscovery70
ReuseReuse262
Actions Measurement and Metrics Study Total87

Many studies were relevant to more than one gap area. The overall distribution of studies is as follows: Culture: 77; Knowledge: 47; Responsibility: 18, Commitment: 2, Resources: 37, Actions: 87.

We did not find any measurement studies in the following areas in our sample: tradeoffs between data management for the short or long term; lack of critical mass for collaboration; support structures for collaboration; duration of commitment (one metrics study); archive mandates and objectives; provenance and authenticity; reuse possibilities; imbalance in funding; unequal access to resources and expertise; different timescales of infrastructure development and maturity. As can be seen in Table 2, there were many more areas for which we did not find metrics studies as well. We discuss these later.

The studies reviewed do not comprehensively represent all written works related to the stewardship gap, but they constitute a large subset of such works. The bibliography on which this analysis is based is posted online (see and ), and we expect to add to it over time. A dynamic visualization of the data in Table 2 is available from York et al. ().

Results

Many stories could be told from the results presented in Table 2. The results most pertinent from the perspective of measuring the stewardship gap are imbalances and differences in the numbers and types of studies in the different gap areas. These are, more specifically:

  • Imbalances in the attention given to different gap areas
  • Imbalances between the number of measurements and metrics studies
  • Differences in the depth of investigation undertaken

Imbalances in attention to different areas

The differing amounts of attention given to measuring different aspects of the stewardship gap that we discovered in our sample is clear from the counts of studies in Table 2. The small amount of attention given to Commitment and Collaboration is particularly striking because these are two areas where deficiencies or strengths have the greatest potential impact on other gap areas (as identified in Figure 2). The large number of studies that focus on sharing and access (under the umbrellas both of Culture and Sharing and Access) in comparison to the smaller numbers on Sustainability Planning, Legal and Policy, Funding, and Curation, Management, and Preservation, is also notable given the influence the latter areas have on data sharing (also as shown in Figure 2).

Table 2 also illustrates the differing attention given to metrics across the gap areas. Some of the most striking results point to areas where no metrics were found (30 out of 56 areas). These include fragmented data management, lack of strategy and planning, dynamic and adaptable infrastructure, discoverability, kinds of collaboration, adequate funding or staff support, and different cultures of research and development. A lack of metrics in these areas may indicate a lack of common targets for individuals or organizations to achieve, or a deficiency in means of evaluating progress.

Future research needs to address the importance of areas that have been little studied until now, and direct attention to those that will have the greatest impact on future stewardship.

Imbalances in measurements and metrics studies

There are several areas where the contrast between measurement and metrics studies is particularly pronounced. These include metrics for sharing attitudes and practices (45 measurement studies to two that articulate metrics), reuse of data (26 to two), fragmented data management (21 to zero), lack of skills (19 to four), lack of infrastructure (19 to two), lack of tools (16 to one), lack of funding (12 to zero), difficulty in management of data for reuse (14 to two), lack of support for data management (10 to zero) and incentives and deficiencies in, and alignment among legal and policy issues (5 to one, 10 to zero, and 11 to three, for incentives, deficiencies, and alignment, respectively).

One of the common challenges encountered by studies of stewardship gap areas is the difficulty of obtaining comparable results across different academic domains, especially at large scale. For example, Borgman et al. () note that while the case study method they use to investigate research data infrastructures could be used in other domains, large-scale surveys would likely be less effective due to the importance of local context. Similarly, in their study of the value and impact of research data, Beagrie and Houghton () describe the challenges of conducting their study in different contexts: “The data collection and economic analysis are time consuming and need to be tailored to the specific nature of operation and use of each data centre.” It is possible that a greater focus on metrics in the areas above (e.g., what indicates a lack of infrastructure; what it means for data management to be fragmented; how the difficulty of managing data can be quantified) as well as areas where metrics have been articulated but not widely agreed upon, would result in the collection of more consistent information in different contexts and domains. This could in turn result in more consistent measurement and comparison of research findings across disciplinary boundaries and at scale.

The imbalance between measurement and metrics studies suggests that future research should emphasize metrics, effectively setting broadly applicable standards for measuring discrete aspects of effective stewardship in order to understand how to improve stewardship, both in specific research and data domains, and more generally across the board.

Differences in the depth of studies

The literature contains multiple types of studies. In one common type, which we termed “targeted,” the entire investigation is focused in one or two closely related areas, such as resources or specific actions like curation (e.g., , , , and , , Cirrinnà). Another common type comprises “wider” studies, which investigate several different gap areas at once, often in the context of a single institution, a nation’s scientific enterprises, or a comparative international framework (e.g., , , , , , , , , , , , , , ). Of the 142 we reviewed, 115 studies were targeted, and 28 were wider.

Wider studies, though they may cover many topics (e.g., in the context of a survey), often have only one or a few questions about any specific given gap area ( and Tenopir et al. 2012 are two exceptions that examine multiple gap areas in depth). A raw count of studies including both targeted and wider studies may thus overestimate the depth of investigation that has occurred in a particular area. Table 3 shows 16 of the 50 overall gap sub-areas where we found either a measurement or metrics study. In all of these 16 there is a significantly higher proportion of wider studies (which did not necessarily investigate the indicated area in depth) than targeted. We include only measurement studies in the table as all but one metrics study, related to responsibility for conducting stewardship activities, were targeted.

Table 3

Gap sub-area measurement studies with a larger proportion of “wider” studies than “targeted”.

Gap Sub-areaMeasurement

TargetedWiderTotal

Fragmented data management71421
Lack of infrastructure41619
Lack of skills41519
Difficulty managing data for reuse31114
Insufficient data curation or management41418
Lack of funding21012
Lack of tools41216
Identifying what is valuable4711
Lack of support for data management01010
Conduct stewardship activities099
Deficiencies that inhibit stewardship, access, and use [in legal and policy areas]01010
Standards178
Incentives that support stewardship, access, and use055
Evaluation of quality145
Lack of people145
Lack of strategy and planning134

The proportion of targeted versus wider studies is an important factor in understanding the universe of research relevant to the stewardship gap. In many cases, such as those indicated in Table 3, not only more research, but more in-depth research is critical to advance our knowledge of the stewardship gap and to give guidance to policy makers, researchers, and research institutions about ways that they can ensure that the research data critical for future success is well stewarded.

Conclusion

This paper has reported the results of our efforts to understand the nature and characteristics of the stewardship gap through a review of relevant literature. In the process of our review we came to understand that there is not a single stewardship gap, but rather numerous and diverse components that contribute to and influence whether research data are responsibly stewarded. We identified 14 gap components or areas from the literature and the relationships between them. We further categorized these components into six major areas, Culture, Knowledge, Responsibility, Commitment, Resources, and Actions, and identified studies that had been conducted to measure or develop metrics in these areas and corresponding subareas. Our effort to measure the stewardship gap led us to focus on three primary results: imbalances in the attention given to different gap areas in the reviewed literature, imbalances in the number of measurement versus metrics studies, and differences in the depth at which studies investigated gap areas.

Our review has shown the stewardship gap literature to be rich with descriptions of challenges to effective stewardship, but that measurement of those challenges is not necessarily balanced. At the same time, the literature is also rich with descriptions of the relationships between challenge or gap areas, and these relationships can provide guidance to institutions and organizations, acting individually or cooperatively, to prioritize and affect gap areas that are most relevant to their situations and needs. Some key questions going forward are:

  • What strategies are most effective for addressing particular gaps or combinations of gaps, and over what timescales?
  • How might these strategies differ depending on discipline, cultures of practice, or levels of knowledge, responsibility or commitment?
  • How can we improve ongoing measurement and evaluation of gap areas to adjust strategies appropriately over time?
  • How can we stay abreast of changes to the gap areas themselves to ensure meaningful and accurate measurement?

It is important to note regarding the final two that the gap areas presented in this paper do not represent all gap areas, only those identified in the literature reviewed. In addition, our review does not cover all works that have been written that are relevant to the stewardship gap. Although it covers a significant subset, and has significantly guided the direction of our research, the stewardship gap bibliography is a work in progress that we expect will become more comprehensive over time through continuing investigation.

Data Accessibility Statement

The works reviewed in the samples of literature used in the study (samples A, B, and C) as well as information pertaining to the evidence of gaps, gap relationships, and study designations associated with each is titled “Stewardship Gap Project Bibliography” and is available at https://doi.org/10.7302/Z2ZW1J47.

Additional File

The additional file for this article can be found as follows:

Appendix

Description of Gap Areas. DOI: https://doi.org/10.5334/dsj-2018-019.s1