Blog

Feb 29, 7:08pm

NISO posted for comment a draft “Recommended Practice for Online Supplemental Journal Article Materials”. The following is my individual perspective on the report.

Response to request for public comments

Dr. Micah Altman
Director of Research; Head/Scientist, Program for Information Science — MIT Libraries, Massachusetts Institute of Technology
Non-Resident Senior Fellow, The Brookings Institution
(Writing in a personal capacity)

Introduction

Thank you for the opportunity to comment on these recommended practices. Supplemental materials have become an increasingly important part of the scholarly record. This report provides a thoughtful framework for developing systematic practices to publish and to steward this content.

As a practicing social scientist, I have attempted to replicate and extend research in my field, and published on the challenges of reproducibility. [Altman, et. al 2003] In my role as an information scientist and administrator, I have led projects and contributed to community-wide efforts to build and maintain open infrastructure and standards for the documentation, dissemination and preservation of research data. [Altman et. al 2001; Altman & King 2007] My contribution is made from this perspective.

Preamble

A substantial proportion of supplementary materials are most naturally characterized as “data”. Data are often critical to fully understanding, evaluating, replicating, and verifying articles — perhaps more than other types of supplementary material. So the task force may wish to take note of broad-based and thoughtful commentary on data publishing that has emerged from the research community recently, such as the following:

  • The National Science Board’s draft report on Digital Research Data Sharing and Management, which emphasizes the values of and challenges to open access to data.
  • The NRC’s recently released prepublication report on Communicating Science and Engineering in the Information Age, which develops a number of recommendations, that although directed at NCSES are readily applicable to research data management, publication, and dissemination in general. Specifically, recommendations 3-1, 3-2, 3-3, and 3-4 together represent good practice for data management and publication. Published results are more likely to be reliable when management of the data supporting them incorporates versioning, open formats and protocols, machine actionable metadata, and management of provenance from data collection through publication. [NRC 2011]
  • Numerous responses to the recent OSTP Request for Information on Public Access to Digital Data Resulting from Federally Funded Scientific Research [OSTP 2011], which comment on the benefits of data access, and draw attention to needs, protocols, and standards for open data access and interoperability. Notably, the responses of the National Digital Stewardship Alliance, the Data-Preservation Alliance for the Social Sciences, Carnegie Mellon University, the University of California Libraries, and the International Consortium for Political and Social Research, reference the need for and successful exemplars of community-based standards for open data dissemination, discovery, and preservation.

Responses

The intent of these responses is not to dispute the overall purpose or framework, but to identify areas of emerging standardization around treatment of data that could be used to further refine and extend the recommendations.

Inconsistent treatment of data as evidence. 

The Draft’s general principle 1.3.2 states that practices must reflect the information future researchers will need to understand and build on articles today. And in most scientific articles, access to data is critical to enable another researcher to understand, assess, and extend the results. Yet 1.3.12 limits the scope of recommendation for treatment of data to the case when they are published as supplementary materials.

Related to this, the distinction made in the Draft between “integral”, “additional”, and “related” content conflates two different types of relationships among supporting information and core articles. In the Draft, “integral content” is defined as supplemental material (material included with the article) “that is essential for the full understanding of the work”; “additional content” is supplemental material that is useful for a deeper understanding of the article. Further, “related content” content that is not included with the article/submission package.  This tripartite categorization conflates evidentiary properties (whether or not one work is essential to the understanding of another) with locational/administrative properties (where the other work is/who manages it).

The Draft disclaims all publisher toward related content, does not include it in any stakeholder roles and responsibility, and (for data) declares it out of scope. This is problematic in practice. Where data that is essential for full understanding is available, it is treated as part of the scholarly record.  But if such data is  managed, published or disseminated  separately from the article, it is ignored. And when integral information is not cited or not available,  the integrity of the scholarly record is necessarily weakened.

The Draft should be revised so that principle 1.3.2 takes precedence —  information  that is essential for full understanding of a work should be understood to be an “integral” and crucial part of the scholarly record, regardless of where that information happens to be.  If integral information is not submitted or managed along with a publication, it should still be cited as evidence; and information that is cited as evidence should be accessible to the scientific community. (Detailed requirements for data citation are elaborated in Altman-King 2007; as well as in many of the reports referenced in the preamble, above).   Science provides a succinct statement of this principle in its General Information for Authors: “Citations to unpublished data and personal communications cannot be used to support claims in a published paper.”, and  “All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science.”  [Science 2011]

Short and Long-Term Access to Data

The current Draft claims that best practice is to treat rights for Integral Content in the same manner  as the core article. The implication is that most supplementary data would remain restricted, which creates a barrier to extending research results, and a barrier to research that integrates supplementary data from many  articles. In contrast, the reports and comments referenced in the preamble above [NRC 2011; NSB 2011; OTSP 2011] emphasize the special value of open access to data.  This suggests that the emerging best practice for integral data is open access, and, at the very least, that integral data  be distributed under terms “no more restrictive than” the core article.

The reports and comments above also note the critical importance of providing data in open formats, using open protocols, and accompanied by machine actionable metadata. The current Draft does not include or reference these emerging best practices. In order to ensure that integral data can be meaningfully accessed (fulfilling its role in understanding, assessment, and extension) the practices  identified in other data-sharing recommendations should also be adopted in the Draft.

REFERENCES

Altman, M., Gill, J., & McDonald, M. (2003). Numerical issues in statistical computing for the social scientist. New York: John Wiley & Sons

Altman, M., & King, G. (2007). A Proposed Standard for the Scholarly Citation of Quantitative Data. DLib Magazine, 13(3/4), Available from: http://www.dlib.org/dlib/march07/altman/03altman.html

Data-PASS 2011. “Response to Office of Science and Technology Policy Request for Information on Public Access to Digital Data Resulting from Federally Funded Scientific Research”. Available from: http://www.data-pass.org/sites/default/files/datapass-otsp-rfi-response.pdf

NDSA 2011. “Response to Office of Science and Technology Policy Request for Information on Public Access to Digital Data Resulting from Federally Funded Scientific Research”.
Available from: http://digitalpreservation.gov/documents/NDSA_ResponseToOSTP.pdf

NRC. (2011). Communicating Science and Engineering Data in the Information Age. National Academies Press. Available from: http://www.nap.edu/catalog.php?record_id=13282

NSB (2011).  Digital Research Data Sharing and Management. (Draft) NSB-11-17. Available from: http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf

Science staff. (2011) General Information for authors. Available from: http://www.sciencemag.org/site/feature/contribinfo/prep/gen_info.xhtml

OTSP.  (2011) “Public Access to Digital Data: Public Comments”. Available from: http://www.whitehouse.gov/administration/eop/ostp/library/digitaldata


Jan 19, 10:38am

The National Science Board offered a recent opportunity to comment on the draft report on ‘Digital Research Data Sharing and Management’ by the task force on data policies. The following is my individual perspective on the report.

Response to request for public comments

Dr. Micah Altman
Senior Research Scientist, IQSS, Harvard U. (until 2/29)
Director of Research; Head/Scientist, Program on Information Research — MIT Library, Massachusetts Institute of Technology (as of 3/1/2012)
Non-Resident Senior Fellow, The Brookings Institution
(Writing in a personal capacity)

Introduction

Thank you for the opportunity to respond to this report. I believe this report will advance the discussion of research data sharing and management, raises many thoughtful questions, and makes recommendation that will have a positive impact on the conduct of scientific research.

As a practicing social scientist,  my collaborators and I have attempted to replicate and extend research in my field, and published on the challenges of reproducibility. [Altman, et. al 2003] And in my role as an administrator, I have lead projects and contributed to community-wide efforts to build and maintain open infrastructure and standards for the documentation, dissemination and preservation of research data. [Altman et. al 2001; Altman & King 2007] My contribution is made with this perspective.

Preamble

The task force may wish to take note of broad-based and thoughtful commentary on data management that has emerged from the research community recently, such as the following:

  • Numerous responses to the recent ANPRM on proposed changes to the common rule commented on the relationship between data sharing and privacy. Notably, two responses by data privacy and computer science researchers provide a roadmap for simultaneously increasing data sharing and privacy protections by leveraging advances in theoretical computer science, and by establishing mechanisms for accountability and transparency in data sharing. [Sweeney, et al., 2010; Vadhan, et al. 2010]
  • Numerous responses to the recent OSTP Request for Information on Public Access to Digital Data Resulting from Federally Funded Scientific Research, comment on the benefits of data access, and draw attention to protocols and standards for open data access and interoperability. Notably, the responses of the National Digital Stewardship Alliance, and of the Data-Preservation Alliance for the Social Sciences point to successful exemplars community-based standards for open data dissemination, discovery, and preservation. [NDSA 2011, Data-PASS 2011]
  • The NRC’s recently released prepublication report on Communicating Science and Engineering in the Information Age (which supersedes the letter report cited by the task force) develops a number of recommendations that although directed at NCSES are readily applicable to research data management and dissemination in general. Specifically, recommendations 3-1, 3-2, 3-3, and 3-4 together represent good practice for data management in general: Published results are more likely to be reliable, when management of the data supporting them incorporates versioning, open formats and protocols, machine actionable metadata, and management of provenance from data collection through publication. [ NRC 2011 ]

Responses

The intent of these responses is not to dispute the recommendations of the task force, but to identify areas of emerging standardization that could be used to further refine and extend the recommendations.

Recommendation 1, and the discussion related to it, calls for NSF to provide leadership in policy development, notes the diversity of stakeholder communities, and cautions against one-size-fits-all solutions. This point is well-taken, as each discipline should be empowered to set priorities for embargo policies, documentation standards, and the like. Nevertheless, as the NDSA recommendations emphasize, some baseline requirements should be applied to all research data management:

Notwithstanding, there are still baseline conditions or requirements that apply to all data
regardless of discipline, particularly as they relate to archiving and preservation. For most
data, “open access” is needed not only for the short term, but for the long term. And
scientific disciplines have focused primarily on short-term access. There are critical
standards for metadata exchange, fixity information and verification, and persistent
citation that can support long-term access to data, preservation, and the long-term
reproducibility of public results. [NDSA 2011]

Recommendation 2 calls for grantees to make data, methods and techniques available to verify and extend figures, tables, findings, and conclusions. The recommendation also notes that data should be shared using persistent electronic identifiers.

This point is also well-taken, and would greatly accelerate scientific progress in many fields. The task force may also wish to consider the emerging body of work that demonstrates that scientific publications should, in addition to including persistent identifiers for data, treat references to data in a manner consistent with references to other scientific works — publications should include full citations to data in the standard reference section, and these should be indexed along with other references. [Data-PASS 2011; Altman & King 2007]

Recommendations 4 and 5, and the discussion related to them, emphasize the need for the stakeholders to convene and explore business models; the need for an expansion sustainable data management; and the lack of sufficient standards and business models.

This is clearly right. Notwithstanding, there are a number of successful standards and models that have emerged in different communities, and which the task force may wish to consider as exemplars. Moreover, it is important to note that standards and business models are insufficient. In addition, as the NDSA response points out, it is critical that the capability for data management be demonstrated, rather than asserted:

“Memory institutions such as archives, libraries and museums have an extensive track record with these functions and collaborative organizations such as NDSA could serve the essential purpose of developing or implementing frameworks that thoroughly test and certify assertions.” [NDSA 2011]

REFERENCES

Altman, M., Gill, J., & McDonald, M. (2003). Numerical issues in statistical computing for the social scientist. New York: John Wiley & Sons

Altman, M., & King, G. (2007). A Proposed Standard for the Scholarly Citation of Quantitative Data. DLib Magazine, 13(3/4), Available from: http://www.dlib.org/dlib/march07/altman/03altman.html

Data-PASS 2011. “Response to Office of Science and Technology Policy Request for Information on Public Access to Digital Data Resulting from Federally Funded Scientific Research”. Available from: http://www.data-pass.org/sites/default/files/datapass-otsp-rfi-response.pdf

NDSA 2011. “Response to Office of Science and Technology Policy Request for Information on Public Access to Digital Data Resulting from Federally Funded Scientific Research”.
Available from: http://digitalpreservation.gov/documents/NDSA_ResponseToOSTP.pdf

NRC. (2011). Communicating Science and Engineering Data in the Information Age. National Academies Press. Available from: http://www.nap.edu/catalog.php?record_id=13282

Sweeney, L. , et al. 2010. “Comments from Data Privacy Researchers”. Available from: http://dataprivacylab.org/projects/irb/DataPrivacyResearchers.pdf

Vadhan, S. , et al. 2010. “Re: Advance Notice of Proposed Rulemaking: Human Subjects Research Protections”. Available from: http://dataprivacylab.org/projects/irb/Vadhan.pdf


Oct 23, 4:51pm
Prepared for Campbell Public Affairs Institute, Syracuse U.

Sep 15, 2:59pm

Data Sharing & Data Citation

Prepared for data coding, analysis, archiving, and sharing for open collaboration National Science Foundation Sept 15-16, 2011


Sep 13, 11:57am

http://prezi.com/q4rvkuu-6_ki/information-technology-and-the-research-library/”>Information, Technology, and the Research Library

This is a presentation I gave at the MIT Libraries


Pages