From AEAStat staff:
Request for Public Comment on Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research
The White House Office of Science and Technology Policy is seeking public comments on a draft set of desirable characteristics of data repositories used to locate, manage, share, and use data resulting from Federally funded research. The purpose of this effort is to identify and help Federal agencies provide more consistent information on desirable characteristics of data repositories for data subject to agency Public Access Plans and data management and sharing policies, whether those repositories are operated by government or non-governmental entities. Optimization and improved consistency in agency-provided information for data repositories is expected to reduce the burden for researchers. Feedback obtained through this Request for Comments (RFC) will help to inform coordinated agency action.
Comments will be accepted until 11:59 p.m. ET on March 17, 2020.
The Subcommittee on Open Science (SOS) of the National Science and Technology Council's Committee on Science (https://www.whitehouse.gov/ostp/nstc/
) convenes more than twenty Federal departments and agencies (hereafter “agencies”) that support research and development (R&D). It aims to advance open science and foster implementation of agency Public Access Plans that were developed in response to the 2013 White House Office of Science and Technology Policy (OSTP) memorandum entitled “Increasing Access to the Results of Federally Funded Scientific Research” that called for improved access to data and publications resulting from Federally funded R&D.
One goal of the Subcommittee's efforts is to improve the consistency of guidelines and best practices that agencies provide about the long-term preservation of data from Federally funded research, including suitable repositories for preserving and providing access to such data, considering agency missions, best practices, and relevant standards.
In support of its work, the SOS has developed a proposed set of desirable characteristics of data repositories for data resulting from Federally funded research. The proposed characteristics could apply to repositories operated by government or non-governmental entities. They draw from agency experience in developing and supporting data repositories and build on existing information for selecting repositories that agencies developed as part of their public access policies. Through public comment, the SOS aims to refine and develop a common set of characteristics that Federal R&D-funding agencies can use to support their Public Access and data sharing efforts.
DRAFT Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded or Supported Research
I. Desirable Characteristics for All Data Repositories
A. Persistent Unique Identifiers: Assigns datasets a citable, persistent unique identifier (PUID), such as a digital object identifier (DOI) or accession number, to support data discovery, reporting (e.g., of research progress), and research assessment (e.g., identifying the outputs of Federally funded research). The PUID points to a persistent landing page that remains accessible even if the dataset is de-accessioned or no longer available.
B. Long-term sustainability: Has a long-term plan for managing data, including guaranteeing long-term integrity, authenticity, and availability of datasets; building on a stable technical infrastructure and funding plans; has contingency plans to ensure data are available and maintained during and after unforeseen events.
C. Metadata: Ensures datasets are accompanied by metadata sufficient to enable discovery, reuse, and citation of datasets, using a schema that is standard to the community the repository serves.
D. Curation & Quality Assurance: Provides, or has a mechanism for others to provide, expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata.
E. Access: Provides broad, equitable, and maximally open access to datasets, as appropriate, consistent with legal and ethical limits required to maintain privacy and confidentiality.
F. Free & Easy to Access and Reuse: Makes datasets and their metadata accessible free of charge in a timely manner after submission and with broadest possible terms of reuse or documented as being in the public domain.
G. Reuse: Enables tracking of data reuse (e.g., through assignment of adequate metadata and PUID).
H. Secure: Provides documentation of meeting accepted criteria for security to prevent unauthorized access or release of data, such as the criteria described in the International Standards Organization's ISO 27001 (https://www.iso.org/isoiec-27001-information-security.html
) or the National Institute of Standards and Technology's 800-53 controls (https://nvd.nist.gov/800-53
I. Privacy: Provides documentation that administrative, technical, and physical safeguards are employed in compliance with applicable privacy, risk management, and continuous monitoring requirements.
J. Common Format: Allows datasets and metadata to be downloaded, accessed, or exported from the repository in a standards-compliant, and preferably non-proprietary, format.
K. Provenance: Maintains a detailed logfile of changes to datasets and metadata, including date and user, beginning with creation/upload of the dataset, to ensure data integrity.
II. Additional Considerations for Repositories Storing Human Data (Even if De-Identified)
A. Fidelity to Consent: Restricts dataset access to appropriate uses consistent with original consent (such as for use only within the context of research on a specific disease or condition).Start Printed Page 12952
B. Restricted Use Compliant: Enforces submitters' data use restrictions, such as preventing reidentification or redistribution to unauthorized users.
C. Privacy: Implements and provides documentation of security techniques appropriate for human subjects' data to protect from inappropriate access.
D. Plan for Breach: Has security measures that include a data breach response plan.
E. Download Control: Controls and audits access to and download of datasets.
F. Clear Use Guidance: Provides accompanying documentation describing restrictions on dataset access and use.
G. Retention Guidelines: Provides documentation on its guidelines for data retention.
H. Violations: Has plans for addressing violations of terms-of-use by users and data mismanagement by the repository.
I. Request Review: Has an established data access review or oversight group responsible for reviewing data use requests.