1. Introduction

To advance many frontiers of science, research data must be shared across the borders of both disciplines and countries, among the various organizations, institutions, and research teams that often stretch around the globe. In this internationally distributed and multidisciplinary environment, open science requires data that are discoverable and reusable in conjunction with the FAIR Principles (https://www.go-fair.org/fair-principles/). These principles serve to guide data producers, publishers, and other involved parties to share data in a way that could enhance the ability of machines to automatically find and use their scholarly data, besides the reuse by individuals. In other words, FAIR principles emphasize and support data sharing practices that could enable machine-actionability in finding and using data in the same manner that a human would but with different scope, scale, and speed (). Data management plans (DMPs) serve to address the what, how, who, and where of data management by formally outlining the roles, responsibilities and activities for managing data during and after research () in alignment with FAIR principles.

Recent calls to create machine-actionable DMPs (maDMPs) to allow for automatic exchange, integration, and validation () could further move all of science toward standardization of research data management (RDM) practices and the formulation of DMPs. Making DMPs machine-actionable could facilitate data discovery and reuse and enable automated evaluation and monitoring (; ). To facilitate the automated quality evaluation of proposed DMPs, Cardoso et al. () offer the use of closed questions (e.g., name repositories, list metadata standards relevant to the discipline, identify file formats, etc.) which researchers could use during DMP creation, a process that is in alignment with Miksa et al.’s () call for the implementation of 10 principles to promote maDMPs. When the results of the automated checks indicate the stated plans are not adequate, human evaluation will be of use.

At present, DMPs continue to be text-based, with the obligation to align with funder requirements and leave any considerations of the FAIR and other data principles up to the researcher. Tools have been created and evaluated to help with writing DMPs (e.g., ), yet concerns remain that DMPs are unenthusiastically created, often as afterthought (e.g., ), with reviewers accepting them rotely ().

2. Research Question

Despite efforts to make DMPs machine-actionable, maDMPs are not yet the norm. Further, evaluation work with the current text-based DMPs has been limited (e.g., ) and the literature reveals that work has not extended to machine-based content analyses of DMPs. This project addresses this gap in the literature by addressing the following research questions:

RQ1: How does an automated content analysis of a set of DMPs compare to a manual evaluation?

RQ2: What does the analysis of successful DMPs reveal?

Using DMPs that accompanied successful proposals to an international and interdisciplinary funding competition, this research investigates whether automated means can serve to evaluate them effectively, and how automated evaluation methods may provide different insights from manual evaluations. The sample of this study is the entire population of DMPs from the 21 funded projects in two funding categories: ocean sustainability and arctic systems.

3. Review of the Literature

3.1. DMPs and data sharing

Top-down mandates for data sharing and reuse are compelling, yet researchers need to be convinced of the importance of managing data responsibly, in a way that will allow for future sharing and reuse. Resistance to this goal of DMPs, data sharing, has been observed (). Reasons cited for hesitations to publish data openly include the potential to have their work ‘scooped’ (; ; ), a low perceived return on investment, possible misuse or use against them (), cyberinfrastructure issues, researcher foot dragging (), and problems with data interoperability and integration (; ). Other studies quote scientists as lacking time and resources to make their data appropriate for sharing (). If having a plan supports data sharing, not having a plan or not being ready to implement one seems to allow for plausible deniability. To make these new workflow tasks easier, tools have been created to assist researchers with DMPs such as the DMPtool.org; DMPonline.dcc.ac.uk; and Data Stewardship Wizard (https://ds-wizard.org/) to name only a few.

Another factor affecting attitudes toward DMPs and the end-goal of data sharing is the mixed messages that might be sent by funders. Dietrich et al. () reviewed funding organizations and found varying disjointedness of coverage in data management policies concerning storage, licensing, metadata, and sharing. Therefore, while funders may require DMPs from researchers, the required elements within these DMPs vary significantly across organizations, and as a result, researchers following required RDM guidelines for one organization may render the data and digital outputs difficult to use (or even unusable) by the original research team or researchers from another organization or domain. In the early days of requiring DMPs, the U.S. National Science Foundation (NSF) reportedly felt the standard for content would emerge through the community of practice (). Although there are benefits to being descriptive versus prescriptive with the requirements for DMPs, to leave the content to the community of practice has the potential to be exclusionary for researchers outside of that immediate field. Leaving requirements fluid has the potential to penalize transdisciplinary teams seeking funding; it is likewise potentially uninviting to interdisciplinary or extra-disciplinary secondary users of the data or digital objects produced.

Still, funding agencies do have trainings and suggested parameters for the elements a DMP should have. The Belmont Forum’s DDOMP provides specific questions about the data and other digital outputs for researchers to address in their proposals that directly map to a scorecard to be used in evaluation. The focus is on 16 criteria in 9 broad areas: (1) the data itself; (2) data storage and use; (3) data management personnel; (4) data security; (5) data preservation concerns; (6) restrictions (required only if necessary); (7) intellectual property; (8) supporting documentation; and (9) long-term costs. maDMPs as well as scoring of any DMP require measurable parameters. At these broad levels, each domain can address these key descriptive and contextual metadata.

3.2. Automated approaches to DMP analysis

The literature indicates that creating maDMPs could enable the automated assessment of DMPs. The format for maDMPs that can support automated assessment of DMPs includes having themes/data management components which represent the common topics addressed in DMPs and using controlled vocabulary associated with the themes (). Though there exists little to no literature on reported efforts to evaluate DMPs automatically, there are approaches DMP stakeholders can take to automate the evaluation process, some of which are the use of text mining techniques such as n-gram analysis. The n-gram analysis enables researchers to convert a text into a set of n-grams such as unigrams, bigrams, and trigrams. Those n-grams and their frequency distributions can be matched/compared with controlled words associated with each component of DMP to determine what key components of DMPs are present in the proposed plans. However, for some components such as ethical issues, the evaluation needs to go beyond assessing the presence of DMP components. In this case with DMPs, human input is inevitable ().

3.3. Manual DMP assessment and evaluation

The bulk of work to assess DMP has been done manually. One primary example is the Institute of Museum and Library Services (IMLS) Document Assessment and Review Tool (DART) project. The DART project developed a rubric to evaluate NSF DMPs in hopes to standardize their assessment and enable institutional comparisons on the absence or presence of certain elements. Ultimately, the rubric created was so exhaustive in the list of potential elements that any use of it would take considerable time (). Grant reviewers, for example, may have other elements such as scientific merit and impacts to prioritize in a review, but these assumptions are anecdotal.

Post-award assessment work, generally done through qualitative studies, has shown the quality of DMPs to be variable. An earlier version of the DART rubric was used to assess 29 DMPs, finding that overall, the quality of the DMPs varied greatly, with roles and responsibilities for data management, metadata standards for describing research data, and policies for protecting intellectual property rights the elements missing most (). Other work has been done in the form of case studies, examining DMPs for certain funding agencies (often, NSF) at specific universities (e.g., ; ; ; ). Assessments of the contents of DMPs in the literature have been both positive and negative. A 2015 study reviewed 182 DMPs across many disciplines and found that 80% of plans adequately described how data would be archived (), results that are construed as positive. Still, most research in this area highlights the shortcomings of DMPs, and generally from the perspective of information scientists or academic librarians. For example, while reviewing 119 DMPs, one team found that 51% did not identify the individual(s) responsible for data management, which was consistent with prior research findings (). As indicated above, a number of these studies revealed less than enthusiastic approaches to DMP creation and, and potentially, implementation and use.

4. Methodology

Twenty-one DMPs (13 DMPs from the Transdisciplinary Research for Ocean Sustainability (https://www.belmontforum.org/cras/#oceans2018); abbreviated here ‘Oc#’) and 8 DMPs from the Resilience in Rapidly Changing Arctic Systems (https://www.belmontforum.org/cras/#arctic2019); abbreviated here ‘Ar#’) from projects funded by the Belmont Forum were analyzed. These DMPs were submitted to the Belmont Forum, which had published information about DMP expectations, including a scorecard () for researchers to self-evaluate their DMPs (). The sample of this study is the entire population of DMPs from the 21 funded projects in these two categories. Figure 1 provides an overview of the research methodology.

Figure 1 

Overview of Research Methodology.

4.1. Automated analysis

Automated analysis was conducted on MAXQDA Analytics Pro (), using its MAXDictio functionalities. To determine terms associated with the scorecard () criteria, we used text-mining features such as unigram frequencies and n-gram combinations offered in ‘word frequencies’ and ‘word combinations’ functions in MAXDictio. ‘Word frequencies’ and ‘word combinations’ functions in MAXDictio allow users to pre-process the corpus for noise reduction, using tidy text techniques like Lemmatization and Stop word removal. The lemmatization replaces words with their base form (). For example, the words ‘plays,’ ‘played,’ and ‘playing’ have ‘play’ as their lemma. Stop words are common and high frequency words like ‘a,’ ‘the,’ ‘of,’ ‘and,’ ‘an’ (). Stop words can be customized. Any content words can be added to the list of stop words if they are viewed as insignificant to provide insights of the corpus. The stop word removal is used to reduce dimensionality of the datasets and therefore enhance performance of feature extraction algorithm in text mining (). When using word frequencies and word combinations functions in MAXDictio, we applied the stop word removal using the built-in standard list of stop words, lemmatization, and search for word combinations with two to three words.

Results from word frequencies and word combinations show the frequency of words in the corpus, their unique occurrence across the documents, and their percentage of unique occurrence. The results were later reviewed, and terms associated with scorecard criteria were manually grouped together. Terms associated with the scorecard criteria () used in phase two of the analysis were identified by the research team: (1) data: type/collection methods/size; (2) metadata: standards/formats; timeframe; data sharing; (3) personnel; (4) security; data security; (5) data retention; preservation personnel; (6) restrictions; sensitive data; limitations; (7) intellectual property; intellectual property rights; licensing; (8) supporting documentation; and (9) costs; long-term costs. The terms identified above, their lexical variants and synonyms and related bi- and trigrams were produced, providing an overarching view of the DMP content.

4.2. Manual scoring

Manual scoring of the same set of DMPs was carried out for comparison using the Belmont Forum scorecard. Members of the research team were trained using the instructions given to Belmont Forum award applicants and reviewers; coding took place over several months in fall/winter 2021, requiring weekly meetings over the span of three months to finalize. The scorecard presents 16 criteria in 9 broad areas: (1) the data itself; (2) data storage and use; (3) data management personnel; (4) data security; (5) data preservation concerns; (6) restrictions (required only if necessary); (7) intellectual property; (8) supporting documentation; and (9) long-term costs. These high-level criteria and their sub-criteria are provided in Appendix A. Full conformance is worth 2 points, incomplete or partial conformance is worth 1 point, and no response or lack of conformance is worth 0 point. Partial conformance is deemed to be adequate, as it implies the criterion was addressed to an extent, but some aspect of the explanation was incomplete. For example, the DMP might not name a specific repository (e.g., ) or institution, or might otherwise address the criterion but not in an actionable way. When coding, team members met to discuss any differences in scores until 100% agreement was reached. An anonymized version of the dataset is available through Github at https://github.com/MinhphamMizzou/DMP_Belmont_Forum_analysis_.git.

Manually-assigned scores were then imported into R Studio (Version 1.1.463 – © 2009–2018 RStudio, Inc.) using the R language () for analysis and visualization. The packages used during the process of importing, transforming, analyzing, visualizing, and formatting included readxl (), dplyr (), ggplot2 (), scales (), tidytext (), and jtools (). The Readxl package was used to import the scores in an .xlsx file into the R environment. Next, dplyr was used to transform the data, conduct an exploratory analysis, and obtain descriptive statistics. Ggplot2, scales, tidytext, and jtools were used to create and format figures. The scripts for the analysis can be found on GitHub https://github.com/MinhphamMizzou/DMP_Belmont_Forum_analysis_.git.

6. Results

6.1. Automated analysis of DMPs

An automated analysis of the corpus provides insight into the extent that required information is present. Terms (unigrams, bigrams and trigrams) included in the set that were essential to scorecard categories were organized according to the category they primarily supported. Table 1 presents these results by scorecard category, ordered by the percentage of documents in which the terms appear. In each category, the greater the number of documents, the better the coverage that can be inferred.

Table 1

Word frequencies of some of n-grams essential to scorecard categories.


CATEGORIESTERM (LEMMATIZED)FREQUENCYNO. DOCUMENTSDOCUMENT %

(1) data/sizetype191152.38

datum format4419.05

size229.52

(2) data storage/data useavailable911990.48

datum repository151152.38

datum storage10733.33

(3) data security/data accessmetadata821885.71

security14942.86

(4) data management personneldatum management652095.24

responsible321676.19

(5) data preservationproject website27733.33

preservation12733.33

long-term use3314.29

long-term preservation229.52

(6) privacy (not required for all DMPs)privacy13942.86

sensitive19838.10

(7) intellectual propertyintellectual property171257.14

(8) data sharingshare651780.95

open access261571.43

license291152.38

make available231047.62

open datum policy6628.57

(9) costcost341257.14

This automated analysis reveals that there is a considerable variation in the frequencies and the unique occurrence of the words associated with the scorecard criteria. Eight criteria had associated terms that appeared in at least half of the DMPs; information about (6) privacy was different and was only required if necessary. A bigram related to (4) data management appears in 95% of the DMPs; related unigram responsible is also seen to relate to data management personnel, and appears in 76% of the DMPs; these results are interpreted to be exceedingly promising for the adequate inclusion of information relating to personnel. Conversely, the bigram intellectual property relating to statements about ownership of (7) intellectual property was surprisingly inconsistent, appearing in only 12 documents total (57%); likewise, cost only appeared in 57% of the DMPs. Unigrams and bigrams supporting (5) digital preservation were also inconsistently supplied, with project website and preservation only appearing in 7 DMPs each (33%); long-term use in 14%; and long-term preservation in 10%. These variations seem to reflect the uneven attention of researchers to different components in the successful DMPs.

As noted, DMPs are free-text and do not use controlled vocabularies; the automated methods for analysis used here are predicted on the consistent use of terminology commonly associated with the required elements of a DMP. Table 1 shows that terms associated with each category should be evident and are the first steps to identifying the presence of key components of DMPs as well as evaluating their quality. This also reinforces previous assertions that to facilitate the creation of machine-actionable DMPs, a list of controlled keywords associated with each key component of DMPs is needed (). To be able to better identify the presence of key categories, and evaluate their quality, there is also a need for analysis mechanisms such as the Description Logics Queries system proposed by Cardoso et al. () which can ‘deduce implicit knowledge from the explicitly represented knowledge’ ().

6.2. Manual analysis

For comparison, a more granular score for the set of DMPs and for the individual categories was created through manual analysis. For each DMP scored manually, a composite score was calculated by adding the manual scores for each of the criteria and dividing by the total number of criteria. Plans with a composite score averaging 1 or more are deemed to be adequate to comply with the Belmont Forum’s Data Policy and Principles (https://www.belmontforum.org/archives/resources/belmont-forum-data-policy-and-principles). These policies and principles were developed around the same time as the FAIR Data Principles and four sentiments on data to be discoverable, accessible, understandable, and future use in sustainable, trusted repositories map closely with to FAIR without a readable acronym. This study finds that 86% (n = 18) of the DMPs provided adequate information and had a composite score ≥1. Of those, 14 DMPs (77.8%) ranged from 1 to 1.5, with the mean of 1.27 (SD = 0.24). Four DMPs (22.2%) out of 18 adequate DMPs had composite scores greater than 1.5, or close to 2. Overall, the sample for the current study (N = 21) had composite scores ranging from 0.81 to 1.69 on a scale from 0 to 2; composite scores averaged 1.21 (M = 1.21, SD = 0.27). Detailed descriptive statistics of the sample are provided in Table 2.

Table 2

Descriptive statistics based on DMP composite scores.


N (OUT OF 21)MSDMEDIANMINMAX

Adequate DMPs (average composite score ≥1)18 (86%)1.270.241.221.001.69

Inadequate DMPs (average composite score <1)3 (14%)0.820.040.810.810.88

Total211.210.271.190.811.69

Next, average scores for the 16 scorecard criteria were calculated; whereas in the automated analysis, the 16 criteria were analyzed according to the nine groups in which they are found, the manual analysis allows for increased granularity in the results. Average criteria scores ranged from 0.1 to 1.9 on a scale of 0 to 2 (see Figure 2). Three quarters of the criteria had scores averaging ≥1 (n = 12; 75%); with only one quarter having scores averaging <1 (n = 4; 25%). The average score for the criteria across all the plans was 1.21 (M = 1.21, SD = 0.44).

Figure 2 

Average scores across all DMPs, by scorecard criterion.

The two criteria on which the plans performed best was a) Data management personnel (average score: 1.9) – full conformance required a named individual be responsible for data management; and b) Data sharing (average score: 1.86) – full conformance required a named repository be indicated. Although the level of conformance could not be confirmed using the automated means, these scores do approximate the results of the automated analysis. Conversely, the lowest average scores were associated with the following four criteria: a) Data size (average score: 0.1); b) Long-term data management personnel (average score: 0.81); c) Data retention (average score: 0.9); and d) Timeframe for access (average score: 0.9). Data size was by far the most overlooked criterion, having been addressed only in a single plan in the sample; this finding is also consistent with the results of the automated analysis.

Comparing criteria scores across adequate and inadequate DMPs presents a way to identify weak areas in order to seek improvements in inadequate DMPs (see Figure 3). The adequate set of DMPs (n = 18; 86%) had consistently strong scores across all the criteria. For the adequate DMPs, only three criteria averaged below 1: Data retention (average score: 0.94), Long-term data management personnel (average score: 0.94), and Data size (average score: 0.11). Of the three inadequate DMPs, seven criteria averaged below 1: the three mentioned for the adequate DMPs, plus Supporting documentation (average score: 0.67), Timeframe for access (average score: 0.33), Data security (average score: 0.33), and Long-term data management costs (average score: 0.00). In other words, the four criteria that differentiated DMPs averaging above or below a 1 were a) Supporting documentation; b) Data security; c) Timeframe for access, and d) Long-term data management costs.

Figure 3 

Average scores by scorecard criterion for adequate and inadequate DMPs.

To investigate possible effects of higher word counts on the adequacy of DMPs, the word counts of the DMPs were calculated. Overall, the average length of a DMP is about 1,000 words with great variability in word counts across the DMPs (M = 1,004.8, SD = 301.6). The DMP with the highest word count had 1,769 words, almost three times longer than the DMP with the lowest word count of 591 words. Table 3 shows that the set with composite scores averaging ≥1 tended to have higher word counts than the set with scores averaging <1. The average word count in the former set was 1,053.17 (M = 1,053.17, SD = 294.8), 47% higher than that in the latter set (M = 714.3, SD = 151.2). The former set also had a greater variability in word counts (SD = 294.84) than the former set (SD = 151.19). This indicates that word counts in the former set are more dispersed than the former set and the higher mean of word counts in the former set may be biased by influential values. Any significant relationships between word counts and the adequacy of DMPs requires further studies with larger sample sizes.

Table 3

Word count for adequate and inadequate DMPs.


N (OUT OF 21)MSDMEDIANMINMAX

Adequate DMPs (average composite score ≥1)18 (86%)1,053.2294.810366061,769

Inadequate DMPs (average composite score <1)3 (14%)714.3151.2714.3591883

Total211,004.8301.610255911,769

7. Discussion and Implications

This automated approach to DMP assessment yields a less granular (i.e., binary assessment) of the presence or absence of a term. In this study, the presence of a term as part of the automated analysis is equated with Level 1 conformance, and automated results are similar to manual assessments. Although a 1:1 comparison of results is not possible with the two differing methodologies, global results allow us to infer that machine-based approaches are a viable option for naïve or large-scale DMP assessment, at least in a preliminary way. Especially if a controlled vocabulary is developed, identification of Level 1 conformance or the binary presence of a term is easier to ascertain. Besides a controlled vocabulary, if funder requirements include standard parameters and elements, then DMPs could have required categorical responses to each element within a parameter; if this were to take place, it would make it easier for automated approaches to be implemented to assess Level 2 conformance. The problem here is buy-in on the part of funders and reviewers.

DMPs are reviewed by funders manually, even if DMPs will ideally support machine-actionability and the implementation of the FAIR data principles. This study shows that the manual analysis led to the assessment of Level 2 conformance at greater granularity than the automated approach, contributing to more accuracy in the evaluation of DMP quality. Although a one-by-one, manual approach to DMP assessment is a reasonable approach to adopt on the part of funders for their purposes, it does not scale and it does not ensure machine-actionability. We note that the extensive efforts required for manual evaluating will make manual coding difficult at scale. Further, the use of automated means aligns with recommendations for best practices (), implying that it is a more promising approach to pursue on the part of funders, especially. There is a disconnect between the two realities; this discussion will assess some of the strengths and weaknesses identified in each.

7.1. Observations

The top frequent unigrams, bigrams, and trigrams in the automated analysis reveal that the elements in the DMPs the researchers mentioned most are data management (n = 20; 95%) followed by availability (n = 19; 90%); metadata (n = 18; 86%), sharing (n = 17; 81%), and responsibility (n = 16; 76%). This suggests at least a Level 1 conformance in these areas with the binary inclusion of the concept inferred. Size and long-term preservation each were inconsistently included terms in the DMPs (n = 2; 10%), as were long-term use (n = 3; 14%), data format (n = 4; 19%) and open data policy (n = 6; 29%). These results were largely consistent with the results from the manual analysis.

Terms identified as supporting (4) data management personnel were two of the most frequently occurring: data management and responsibility. Manual assessments likewise performed best on this element, with the average score of 1.9 indicating high conformance. Full, Level 2 conformance requires that a named individual be included; such a name would not be captured using the automated methods employed, yet the results track together for both methodologies. The same phenomenon is observed in the case of the least frequently occurring term: size. Other terms identified as primarily supporting (1) data/size were type (n = 11, 52%) and data format (n = 4; 19%). Together, these were the weakest performing set of terms for an element according to the automated assessment. Data size was likewise the lowest performing criterion on the manual assessment, with an average score of 0.1. Although the nuances of the content were not captured, automated content analyses seem promising as an overarching proxy for the inclusion of DMP content that is most consistently revealed through manual analysis.

7.2. Recommendations for improvements for DMPs

Close investigation of a number of elements provides insight in improvements for DMPs. First, we concur with the call to create and apply a controlled vocabulary of terms related to each element – this will provide for ease of assessment and confirmation of Level 1 conformance. Second, adequate DMPs consistently had higher word counts, implying that more content is covered through length. Researchers should be encouraged to use the needed number of words to explain their plan rather than providing minimal information or not adequately describing their intention. Increased word counts in DMPs have the potential to support context. Level 2 names should include Level 1 elements to clarify the kind of name they are.

Specifically, the mismatch between the mention of the term metadata in the DMPs found in Table 1 and the quality of performance in the manually encoded ‘metadata standards and formats’ criterion indicate that researchers grappled with the specifics of this element. Incomplete planning in regards to metadata may have negative impacts on the implementation of their DMP, which in return may hamper data sharing and data accessibility. The mismatch in the use of the terminology and the inclusion of consistent Level 1 or Level 2 information also may indicate that providing researchers with scorecard criteria may not suffice to support the creation of effective DMPs and to implement efficient DMP practices. Besides the criteria from funders, there is a need for guidelines on effective DMP practices from organizational funders and research institutions () and support on DMP writing by institutional authorities (). In the case of ‘metadata’, researchers need help from information professionals in the development and implementation of their DMP.

8. Limitations and further study

This study analyzes successful DMPs from an international funder, the Belmont Forum; unfunded DMPs were not included in the analysis, nor were successful DMPs from different funding agencies. Additionally, the projects described were interdisciplinary, focusing on the sciences. Although social sciences methodologies were employed in data collection, this was not the case across the board, and no humanities data were collected or described. Further study should address these limitations based on the sample population analyzed. The methods of analysis used do not permit the one-to-one comparison of results in a particular DMP; rather, the automated analysis of a set of DMPs provides an overview of the set’s conformance using the character strings in the documents; manual analysis was more granular on all counts. Future studies could look at other automated approaches to explore the possibility of generating results which can be compared one-to-one with results from manual work. Further, both analyses do not address whether there has been compliance with the DMPs as written; the projects for which these DMPs were written are still ongoing, and future research can look back to analyze the extent to which the DMPs were updated, to become living documents (), and to which of the parameters of the DMPs have been respected.

9. Conclusion

This project seeks to understand, through the analysis of 21 DMPs associated with funded research projects, both the extent to which successful DMPs can be considered useful at present, and how DMPs can be improved going forward. Requirements for DMPs have become increasingly visible in the landscape of funded research in an effort to promote more transparent, reproducible, and open scientific results. While the importance (and limitations) of DMPs have been extensively explored in the literature, less work has been dedicated to the evaluation of these documents using standardized metrics to determine their success in framing the data management considerations within a research endeavor. This project used the Belmont Forum scorecard to assess the completeness of 21 DMPs and how well they incorporated information on data and digital objects, including their storage, and sharing, and the data management personnel responsible for these activities, according to the metrics included in the scorecard. While many of the DMPs were found to be adequate, several areas of data management remained fairly overlooked, including data size, the timeframe for accessing the data and data retention policies, and information on long-term data management personnel.

As part of the process of analyzing DMPs and the potential for their evaluation, we consider the utility of any scoring initiative, as the DART project funded by the IMLS was widely perceived as being too restrictive, but the scorecard approach studied here had the limitations of not seeing the full proposal and of being evaluated by a team of data specialists, and not experts in the field. Yet, these scorecard evaluations provide a snapshot, allowing for comparisons across DMPs. Recommendations for improvement to support data sharing include funders’ consideration as living documents, limiting options for responses to specific, actionable content, and continued evaluations by all relevant stakeholders, including data experts along with subject-area experts. Although DMP is the common term for most required plans, the Belmont Forum’s DMPs emphases the other digital objects that make science reproducible and are not technically data. Future DMP literature should explore the curatorial considerations for the items, not just data, that support future reuse.