This paper details the drivers, methods, and outcomes of the U.S. Geological Survey’s quest to establish criteria by which to judge its own digital preservation resources as Trusted Digital Repositories. Drivers included recent U.S. legislation focused on data and asset management conducted by federal agencies spending $100M USD or more annually on research activities. The methods entailed seeking existing evaluation criteria from national and international organizations such as International Standards Organization (ISO), U.S. Library of Congress, and Data Seal of Approval upon which to model USGS repository evaluations. Certification, complexity, cost, and usability of existing evaluation models were key considerations. The selected evaluation method was derived to allow the repository evaluation process to be transparent, understandable, and defensible; factors that are critical for judging competing, internal units. Implementing the chosen evaluation criteria involved establishing a cross-agency, multi-disciplinary team that interfaced across the organization.
As the Nation’s largest water, Earth, and biological science and civilian mapping agency, the U.S. Geological Survey (USGS) collects, monitors, analyzes, and provides scientific information about natural resource conditions, issues, and problems. The diversity of the USGS’s scientific expertise enables the organization to carry out large-scale, multi-disciplinary investigations and provide impartial scientific information to resource managers, planners, and decision makers.
Since the inception of the USGS in 1879, the agency has maintained comprehensive internal and external policies and procedures for ensuring the high quality and integrity of its generated scientific interpretations and products. The documented guidance has led to the USGS’ reputation for scientific excellence and objectivity. As new technologies developed and research became more digital, the USGS had to create and adopt new policies. In 1993 the first internal policies were instituted requiring preservation of digital assets. Ten years later the USGS established the web-based USGS Publications Warehouse, its first digital library.
Existing scientific policies and procedures were updated in 2006 and are now known as the USGS Fundamental Science Practices (FSP). These changes established a set of consistent practices, philosophical premises, and operational principles to serve as the foundation for research and monitoring activities related to USGS science.
In January 2009, the Director of the USGS announced the establishment of a Fundamental Science Practices Advisory Committee (FSPAC), which would be responsible for addressing pending and new FSP issues (including previously unresolved issues), fielding questions and concerns about FSP from scientists and managers, and developing recommendations for resolving issues. The FSPAC ensures that the USGS continues to produce high quality, objective science information products by creating guidance on conducting science projects and establishing review processes. In 2012, the FSPAC established a Data Preservation Sub-Committee (Sub-Committee) to identify and guide USGS science data stewardship, preservation, and documentation requirements. This effort resulted in the 2015 USGS policy entitled “Fundamental Science Practices: Preservation Requirements for Digital Scientific Data”.
These policies reflected the push for open data and access occurring on a national scale. On February 22, 2013, the Office of Science and Technology Policy (OSTP) issued a memorandum, Increasing Access to the Results of Federally Funded Scientific Research, which called on all federal agencies with annual research and development (R&D) outlays of more than $100 million to develop a plan increasing public access to the direct results of federally funded scientific research, including specifically peer-reviewed publications and digital data. In addition, on May 9, 2013, the Office of Management and Budget (OMB) released Memorandum M-13-13, Open Data Policy-Managing Information as an Asset. Individually and collectively, these directives established the mandates for the U.S. Federal Government to transform data and information into useable and accessible digital artifacts and promote and accelerate their release (subject to certain limitations imposed by privacy, confidentiality, and national security considerations).
Because 74 percent ($686 million) of the Department of the Interior’s total R&D budget is allocated to the USGS, the bureau was responsible for developing a plan to comply with the OMB memorandum. The USGS Plan focused specifically upon the USGS’ public access activities, policies, and plans as they affect both intramural and extramural research and development activities. It also required that data be stored in a USGS Trusted Digital Repository. Because of their aforementioned research on preservation policies, the Sub-Committee was tasked with establishing standards for evaluating USGS data repositories and their trustworthiness.
Comprised of volunteer staff representing the fields of archival science, digital libraries, computer science, information technology, publishing, and the information sciences, the Sub-Committee evaluated several existing criteria developed by other organizations in order to identify elements relevant to the USGS’s pursuit of establishing Trusted Digital Repositories. The USGS sought methods that would be considered transparent, authoritative, and scalable allowing minimal to wide interagency use. The criteria and approach also had to be applicable to the USGS in that the anticipated level of effort and the direct costs associated with utilizing a specific implementation could be attainable. Lastly, the USGS sought an approach that offered certifications from a reputable organization. The individual criteria sets examined are outlined in the sections that follow.
The first criteria reviewed was the U.S. Federal Records and Information Management Program Maturity Model (JWG FRC and NARA 2014), which included categories such as Management Support & Organizational Structure, Policy, Standards, & Guidance, and RIM Program Operations. The categories were then further sub-divided as illustrated below:
This model was found to be extremely comprehensive to the point of even including training elements for staff on records and information management. The model’s detail and thoroughness also led the USGS to perceive that implementing such a scheme would be somewhat burdensome. Additionally, the scheme was not intended to provide a certification process.
Another criterion examined was compiled by the United Kingdom’s Digital Curation Centre entitled, “Where to keep research data: DCC Checklist for Evaluating Data Repositories” (Whyte 2015). This checklist was built around the following questions:
The DCC criteria are labeled as a checklist and are not intended to provide a certification outcome. So, while the list is offered from an authoritative source, the lack of a formal certification option kept this from consideration by the USGS Sub-Committee. The DCC questions, however, were found to be relevant and understandable.
This approach originated from the National Oceanic and Atmospheric Administration (NOAA). A paper entitled, “A Unified Framework for Measuring Stewardship Practices Applied to Digital Environmental Datasets” (Peng et al. 2015) describes this method. The key components include Preservability, Accessibility, Usability, Production Sustainability, Data Quality Assurance, Data Quality Control/Monitoring, Data Quality Assessment, Transparency/Traceability and Data Integrity. Each of the nine components have rankings to be assigned. The rankings all utilize the scheme below:
The NOAA criteria again are fairly comprehensive with an emphasis upon quality elements. The use of a consistent five-level scoring method for each element was straight forward and would be fairly easy to explain. However, there would be some subjectivity in applying the levels across all of the elements and thus, the USGS did not choose this approach. Additionally, the NOAA option is not intended to provide certification.
Each of the DSA criteria had additional text, statements and questions provided to assist comprehending the information being requested. All of the criteria were judged to be relevant, the length and complexity of the criteria were not considered to be overly burdensome, and the DSA offered a certification option. The USGS had previous experience with the Data Seal of Approval approach. One representative from the Sub-Committee had gone through the Data Seal of Approval process and received certification in 2015. This experience and applicability of the criteria led the USGS to consider adopting this approach.
The International Standards Organization (ISO) issued a standard labeled 16363–2012 related to records management. Module 8, entitled, Becoming A Trusted Digital Repository, describes the required elements of the ISO standard. The elements are grouped under Organizational Infrastructure, Digital Object Management, Infrastructure and Security Risk Management categories. Additional topics under each element include:
The ISO approach is very complete and exhaustive, as expected from this authoritative organization. Similar to other approaches outlined above, the level of detail sought along with the anticipated time commitment led the USGS to observe this approach would be too burdensome at this stage in the agency’s need to establish a process and criteria leading to Trusted Digital Repositories. The ISO approach does provide certification, though.
The Library of Congress sponsored the National Digital Stewardship Alliance (NDSA) (Phillips et al. 2013), which developed a “…tiered set of recommendations for how organizations should begin to build or enhance their digital preservation activities. It allows institutions to assess the level of preservation achieved for specific materials in their custody, or their entire preservation infrastructure. It is not designed to assess the robustness of digital preservation programs as a whole since it does not cover such things as policies, staffing, or organizational support. The guidelines are organized into five functional areas that are at the heart of digital preservation systems: storage and geographic location, file fixity and data integrity, information security, metadata, and file formats”.
The USGS Data Preservation Sub-Committee built upon the NDSA recommendations and replaced some text (e.g. changed fixity to checksums) to be more understandable to USGS personnel. The USGS added the element of Physical Media because of the large role that media selections can have on the preservation of agency science data. The levels, beginning with the most elementary achievement of level one, extend to the desired and more demanding level four attainments. These recommendations are presented in Table 1.
|ELEMENT||LEVEL ONE||LEVEL TWO||LEVEL THREE||LEVEL FOUR|
|Storage and Geographic Location||– two complete copies stored physically separate from each other
– Transfer the digital content from temporary media into an established storage system
–Managed storage system in place
|– three complete copies
– At least one copy in a different geographic location (offsite locations must follow NARA 1571 guidelines (NARA 2002))
– Document the storage system and storage media
|– At least one copy in a geographic location with a different disaster threat (e.g. hurricane area versus an earthquake area)
– Maintain an obsolescence monitoring process for the storage system and media
|– At least three copies in geographic locations with different disaster threats
– Implement a comprehensive plan that keeps files and metadata on currently accessible systems and media
|Data Integrity||– Verify checksums on ingest, if provided- Create checksums if not provided- Virus check all content||– Verify checksums on all data ingest
– Use read only procedures when working with original media
|– Verify checksums at fixed intervals
– Maintain logs of checksums and supply audit information on demand
– Maintain procedures to detect corrupt data
– Virus check all content
|– Verify checksums of all content in response to specific events or activities
– Maintain procedures to replace or repair corrupted data
– Ensure no one person has write access to all copies
– Create, store, and verify a second, different checksum for all content
|Information Security||– Identify who has authorization to read, write, move, and delete individual files
– Limit authorizations to individual files
|– Document access restrictions for content||– Maintain logs of who performed what actions on files, including deletions and preservation actions||– Perform audit of logs|
|Metadata||– Inventory of content and its storage location
– Ensure backup and physical separation of inventory information
– Adhere to current USGS metadata standards
|– Store all relevant database management information
– Store information describing changes to the structure or format of the data, including time of occurrence
– Provide access to all forms of the metadata
|– Preserve standard technical, descriptive, and preservation metadata||– Same as Level 3|
|File Formats||– Encourage the use of a limited set of documented and open file formats, codecs, compression schemes, and encapsulation schemes||– Inventory the file formats in use||– Monitor file format obsolescence issues||– Perform format migrations, emulations (a virtual instance of a previous operating system or procedure) and similar activities|
|Physical Media||– Inventory all physical media utilized including hard disks.||– Develop a plan to utilize trade studies to evaluate medias suitable for USGS purposes. – Begin to transition away from all media utilized that are 10 years or more in age.||– All non-recommended media have been properly disposed of following transition activities.||– Base all media choices on trade studies.
– All information is migrated from an older media to a newer media every 3 to 5 years including hard disks.
The NDSA criteria were not intended to provide a certification. The elements and progression are intended more to identify specific areas that digital repositories need to address. Utilizing the table elements could easily be incorporated into an organization’s certification submission. As such, the USGS found the approach useful and advocated its use within the agency, but not as the tool recommended for agency certification.
Reviewing the different approaches allowed the USGS to identify the elements included by each. Table 2 illustrates how the various approaches compare in terms of the elements addressed.
|Organizational Structure, Mandate, Financial Resources||X||X||X||X|
|Lifecycle Management, Preservation||X||X||X||X||X||X||X|
The amount of perceived effort to implement one of the approaches was key to USGS’ review. Table 3 details the perceived level of effort USGS would need to adopt an approach.
Prior to 2015, the USGS had not had an agency-wide plan or strategy to securely preserve its science records. With the organization widely dispersed across the United States, different approaches had been implemented. A consistent, authoritative means was sought to address this fragmented methodology. In addition, increased desires for federal agencies dealing with science records to have documented plans and procedures in place was being noted. To address those needs and after careful evaluation of the pros and cons of the various approaches used by other organizations to review trusted digital repositories, the Data Preservation Sub-Committee recommended using the Data Seal of Approval/World Data System approach to evaluate USGS trusted digital repositories. The ISO approach, while arguably, the most comprehensive, is beyond anticipated USGS resources to implement. Providing a certification path, the anticipated cost, complexity, and usability from the DSA-WDS approach all align well to USGS needs and capabilities. Also, this particular approach suited the requirement for a process that would be viewed as transparent as the DSA-WDS criteria are publically viewable and their reviews are conducted by a blind panel. This approach is laid out in a straight-forward manner allowing the technique to be easily understood by agency staff. Using an internationally recognized approach, one that is drawn from two authoritative bodies, allows the USGS to defend the choice to both USGS staff as well as those outside of the USGS.
The relatively recent, combined DSA-WDS draft criteria were reviewed in February 2016. The 16 primary elements in this approach include addressing the following statements:
To address the recent federal mandates requiring U.S. federal agencies to expose and enhance access to federally funded scientific research, the USGS established a cross-agency team to develop a strategic approach for evaluating internal trusted digital repositories for managing scientific assets produced by USGS researchers. The review of multiple criteria developed by national and international organizations to evaluate digital repositories revealed the Data Seal of Approval/World Data System approach as the best option for use by the USGS. This approach will enable the USGS to ensure its repositories are robust and reliable, enabling exposure and access to USGS assets by researchers and the public.
Since 2013, several new data management policies have been developed and implemented by the USGS to preserve and enhance access to scientific assets. The establishment of criteria enabling the certification of agency Trusted Digital Repositories was an important element to ensure the preserved digital assets are well managed in reliable systems. The adoption of DSA-WDS Partnership Working Group Catalogue of Common Requirements for trusted digital repository evaluation enhances the lifecycle approach the USGS has adopted to create, maintain, make accessible and preserve its scientific endeavors.2
The author would like to thank Keith Kirk, Keith Richmond, Clara Brown, Natalie Latysh, Tara Bell, and Sandra Cooper for their contributions.
The author has no competing interests to declare.
Data Archiving and Networked Services (2016). Data Seal of Approval: On-line assessment tool In: Netherlands: Available at: http://www.datasealofapproval.org/en/information/guidelines/ [Last accessed 29 April 2016].
Data Seal of Approval-World Data System (2016). DSA-WDS Partnership Working Group Catalogue of Common Requirements In: Research Data Alliance. Available at: https://rd-alliance.org/groups/repository-audit-and-certification-dsa-wds-partnership-wg.html [Last accessed 29 April 2016].
International Standards Organization (2012). ISO 16363-2012: AUDIT AND CERTIFICATION OF TRUSTWORTHY DIGITAL REPOSITORIES. Available at: http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=56510 [Last accessed 29 April 2016].
Joint Working Group of the Federal Records Council and National Archives and Records Administration (2014). Federal RIM Program Maturity Model User’s Guide. Available at: https://www.archives.gov/records-mgmt/prmd.html [Last accessed 29 April 2016].
National Archives and Records Administration (2002). Archival Storage Standards. Available at: https://www.archives.gov/files/foia/directives/nara1571.pdf [Last accessed 29 December 2016].
Office of Management and Budget (2013). Open Data Policy – Managing Information as an Asset In: Washington, DC: Available at: https://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf [Last accessed 29 April 2016].
Office of Science and Technology Policy (2013). Increasing Access to the Results of Federally Funded Scientific Research In: Washington, DC: Available at: https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf [Last accessed 29 April 2016].
Peng, G, Privette, J, Kearns, E, Ritchey, N and Ansari, S (2015). A UNIFIED FRAMEWORK FOR MEASURING STEWARDSHIP PRACTICES APPLIED TO DIGITAL ENVIRONMENTAL DATASETS. Data Science Journal February 2 201513DOI: https://doi.org/10.2481/dsj.14-049 Available at: http://datascience.codata.org/articles/abstract/10.2481/dsj.14-049/ [Last accessed 29 April 2016].
Phillips, M, Bailey, J, Goethals, A and Owens, T (2013). The NDSA Levels of Digital Preservation: An Explanation and Uses. Available at: http://ndsa.org/documents/NDSA_Levels_Archiving_2013.pdf [Last accessed 2 May 2016].
Whyte, A (2015). Where to keep research data: DCC Checklist for Evaluating Data Repositories. v.1 Edinburgh: Digital Curation Centre. Available at: http://www.dcc.ac.uk/resources/how-guides [Last accessed 29 April 2016].