Response to request for public comments
Dr. Micah Altman
Director of Research; Head/Scientist, Program for Information Science — MIT Libraries, Massachusetts Institute of Technology
Non-Resident Senior Fellow, The Brookings Institution
(Writing in a personal capacity)
Thank you for the opportunity to comment on these recommended practices. Supplemental materials have become an increasingly important part of the scholarly record. This report provides a thoughtful framework for developing systematic practices to publish and to steward this content.
As a practicing social scientist, I have attempted to replicate and extend research in my field, and published on the challenges of reproducibility. [Altman, et. al 2003] In my role as an information scientist and administrator, I have led projects and contributed to community-wide efforts to build and maintain open infrastructure and standards for the documentation, dissemination and preservation of research data. [Altman et. al 2001; Altman & King 2007] My contribution is made from this perspective.
A substantial proportion of supplementary materials are most naturally characterized as “data”. Data are often critical to fully understanding, evaluating, replicating, and verifying articles — perhaps more than other types of supplementary material. So the task force may wish to take note of broad-based and thoughtful commentary on data publishing that has emerged from the research community recently, such as the following:
- The National Science Board’s draft report on Digital Research Data Sharing and Management, which emphasizes the values of and challenges to open access to data.
- The NRC’s recently released prepublication report on Communicating Science and Engineering in the Information Age, which develops a number of recommendations, that although directed at NCSES are readily applicable to research data management, publication, and dissemination in general. Specifically, recommendations 3-1, 3-2, 3-3, and 3-4 together represent good practice for data management and publication. Published results are more likely to be reliable when management of the data supporting them incorporates versioning, open formats and protocols, machine actionable metadata, and management of provenance from data collection through publication. [NRC 2011]
- Numerous responses to the recent OSTP Request for Information on Public Access to Digital Data Resulting from Federally Funded Scientific Research [OSTP 2011], which comment on the benefits of data access, and draw attention to needs, protocols, and standards for open data access and interoperability. Notably, the responses of the National Digital Stewardship Alliance, the Data-Preservation Alliance for the Social Sciences, Carnegie Mellon University, the University of California Libraries, and the International Consortium for Political and Social Research, reference the need for and successful exemplars of community-based standards for open data dissemination, discovery, and preservation.
The intent of these responses is not to dispute the overall purpose or framework, but to identify areas of emerging standardization around treatment of data that could be used to further refine and extend the recommendations.
Inconsistent treatment of data as evidence.
The Draft’s general principle 1.3.2 states that practices must reflect the information future researchers will need to understand and build on articles today. And in most scientific articles, access to data is critical to enable another researcher to understand, assess, and extend the results. Yet 1.3.12 limits the scope of recommendation for treatment of data to the case when they are published as supplementary materials.
Related to this, the distinction made in the Draft between “integral”, “additional”, and “related” content conflates two different types of relationships among supporting information and core articles. In the Draft, “integral content” is defined as supplemental material (material included with the article) “that is essential for the full understanding of the work”; “additional content” is supplemental material that is useful for a deeper understanding of the article. Further, “related content” content that is not included with the article/submission package. This tripartite categorization conflates evidentiary properties (whether or not one work is essential to the understanding of another) with locational/administrative properties (where the other work is/who manages it).
The Draft disclaims all publisher toward related content, does not include it in any stakeholder roles and responsibility, and (for data) declares it out of scope. This is problematic in practice. Where data that is essential for full understanding is available, it is treated as part of the scholarly record. But if such data is managed, published or disseminated separately from the article, it is ignored. And when integral information is not cited or not available, the integrity of the scholarly record is necessarily weakened.
The Draft should be revised so that principle 1.3.2 takes precedence — information that is essential for full understanding of a work should be understood to be an “integral” and crucial part of the scholarly record, regardless of where that information happens to be. If integral information is not submitted or managed along with a publication, it should still be cited as evidence; and information that is cited as evidence should be accessible to the scientific community. (Detailed requirements for data citation are elaborated in Altman-King 2007; as well as in many of the reports referenced in the preamble, above). Science provides a succinct statement of this principle in its General Information for Authors: “Citations to unpublished data and personal communications cannot be used to support claims in a published paper.”, and “All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science.” [Science 2011]
Short and Long-Term Access to Data
The current Draft claims that best practice is to treat rights for Integral Content in the same manner as the core article. The implication is that most supplementary data would remain restricted, which creates a barrier to extending research results, and a barrier to research that integrates supplementary data from many articles. In contrast, the reports and comments referenced in the preamble above [NRC 2011; NSB 2011; OTSP 2011] emphasize the special value of open access to data. This suggests that the emerging best practice for integral data is open access, and, at the very least, that integral data be distributed under terms “no more restrictive than” the core article.
The reports and comments above also note the critical importance of providing data in open formats, using open protocols, and accompanied by machine actionable metadata. The current Draft does not include or reference these emerging best practices. In order to ensure that integral data can be meaningfully accessed (fulfilling its role in understanding, assessment, and extension) the practices identified in other data-sharing recommendations should also be adopted in the Draft.
Altman, M., Gill, J., & McDonald, M. (2003). Numerical issues in statistical computing for the social scientist. New York: John Wiley & Sons
Altman, M., & King, G. (2007). A Proposed Standard for the Scholarly Citation of Quantitative Data. DLib Magazine, 13(3/4), Available from: http://www.dlib.org/dlib/march07/altman/03altman.html
Data-PASS 2011. “Response to Office of Science and Technology Policy Request for Information on Public Access to Digital Data Resulting from Federally Funded Scientific Research”. Available from: http://www.data-pass.org/sites/default/files/datapass-otsp-rfi-response.pdf
NDSA 2011. “Response to Office of Science and Technology Policy Request for Information on Public Access to Digital Data Resulting from Federally Funded Scientific Research”.
Available from: http://digitalpreservation.gov/documents/NDSA_ResponseToOSTP.pdf
NRC. (2011). Communicating Science and Engineering Data in the Information Age. National Academies Press. Available from: http://www.nap.edu/catalog.php?record_id=13282
NSB (2011). Digital Research Data Sharing and Management. (Draft) NSB-11-17. Available from: http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf
Science staff. (2011) General Information for authors. Available from: http://www.sciencemag.org/site/feature/contribinfo/prep/gen_info.xhtml
OTSP. (2011) “Public Access to Digital Data: Public Comments”. Available from: http://www.whitehouse.gov/administration/eop/ostp/library/digitaldata