Data Science

Altman M, Borgman C, Crosas M, Martone M.  An Introduction to the Joint Principles for Data Citation. Bulletin of the Association for Information Science and Technology [Internet]. 2015;41 (3) :43-44. Publisher's VersionAbstract
Data citation is rapidly emerging as a key practice supporting data access, sharing and reuse, as well as sound and reproducible scholarship. Consensus data citation principles, articulated through the Joint Declaration of Data Citation Principles [3], represent an advance in the state of the practice and a new consensus on citation
Allen L, Scott J, Brand A, Hlava M, Altman M. Publishing: Credit where credit is due. Nature [Internet]. 2014;508 (7496) :312-313. Publisher's Version
Altman M, Archer P, Borgman C, Brand A, Brase J, Callaghan S, Cass K, Carroll B, Cohen D, deWaard A, et al. Joint Principles for Data Citation. Data Synthesis Task Group; 2014. Publisher's VersionAbstract
Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse. In support of this assertion, and to encourage good practice, we offer a set of guiding principles for data within scholarly literature, another dataset, or any other research object.
Altman M, Arnaud E, Borgman C, Callaghan S, Brase J, Carpenter T, Chavan V, Cohen D, Hahnel M, Helly J, et al. Out of Cite, Out of Mind: The Current State of Practice, Policy and Technology for Data Citation. Data Science Journal [Internet]. 2013;12 :1–75. Publisher's VersionAbstract
The relatively new practice of making bibliographic references to data sets with formal citations begins to address long-standing problems limiting our collective ability to locate data and to reuse them effectively in advancing science. References made and citations received support a research infrastructure to provide the necessary recognition and reward of data work, in addition to providing attribution detail, facilitating future access, and fostering cross-collaboration and investigation. They are the links between the data and the published research results needed to maintain the integrity of the scientific method. Some research funders have begun to require that publicly funded research data be deposited with various data centers. As these practices become better established, the ability to detect, locate, obtain, and understand the data from prior research will be circumscribed by our ability to have a sufficient description of those data: a citation. Based on a review of emerging practices and analysis of existing literature on citation practices, we have identified the following set of “first principles” for data citation:
Out of Cite, Out of Mind
Altman M. Mitigating Threats To Data Quality Throughout the Curation Lifecycle, in Curating For Quality: Ensuring Data Quality to Enable New Science. ; 2012 :1–119. Publisher's VersionAbstract

Data quality criteria implied by the candidate frameworks are neither easily harmonized nor readily quantified. Thus, a generalized systematic approach to evaluating data quality seems unlikely to emerge soon. Fortunately, developing an effective approach to digital curation that respects data quality does not require a comprehensive definition of data quality. Instead, we can appropriately address “data quality” in curation by limiting our consideration to a narrower applied questions: Which aspects of data quality are (potentially) affected by (each stage of) digital curation activity? And how do we keep invariant data quality properties at each curation stage? A number of approaches suggest seem particularly likely to bear fruit: Incorporate portfolio diversification in selection and appraisal. Support validation of preservation quality attributes such as authenticity, integrity, organization, and chain of custody throughout long-term preservation and use — from ingest through delivery and creation of derivative works. Apply semantic fingerprints for quality evaluation during ingest, format migration and delivery. These approaches have the advantage of being independent of the content subject area, the domain of measure, and the particular semantics content of objects and collections — so they are broadly applicable. By mitigating these broad-spectrum threats to quality, we can improve the overall quality of curated collections and their expected value to target communities.

Mitigating Threats
Altman M. Computational Modeling. In: Kurian GT The Encyclopedia of Political Science. CQ Press ; 2011. pp. 291–292. Publisher's VersionAbstract
In political economy, computational models are used to simulate the behavior of institutions or individuals. Researchers use these models to explore emergent patterns in the behavior of individuals and institutions over time. Computational models are used as a complement to mathematical models -and as a form of independent theory construction in their own right.
Novak K, Altman M, Broch E, Carroll JM, Clemins PJ, Fournier D, Laevart C, Reamer A, Meyer EA, Plewes T. Communicating Science and Engineering Data in the Information Age. National Academies Press; 2011. Publisher's VersionAbstract
The National Center for Science and Engineering Statistics (NCSES) of the National Science Foundation (NSF) communicates its science and engineering (S&E) information to data users in a very fluid environment that is undergoing modernization at a pace at which data producer dissemination practices, protocols, and technologies, on one hand, and user demands and capabilities, on the other, are changing faster than the agency has been able to accommodate.
Altman M. A Fingerprint Method for Scientific Data Verification, in Proceedings of the International Conference on Systems Computing Sciences and Software Engineering 2007. New York: Springer Netherlands ; 2008 :311–316. Publisher's VersionAbstract
This article discusses an algorithm (called "UNF") for verifying digital data matrices. This algorithm is now used in a number of software packages and digital library projects. We discuss the details of the algorithm, and offer an extension for normalization of time and duration data.
Altman M, King G. Overview of a proposed standard for the scholarly citation of quantitative data. IASSIST Quarterly [Internet]. 2006;30 :18–19. Publisher's VersionAbstract
Critical components of the scholarly and library community are use of a common language and universal standards for scholarly citations and credit attribution, to enable the location and retrieval of articles and books. We present a proposal for a similar universal standard for citing quantitative data that retains the advantages of print citations, adds other components made possible by, and needed due to, the digital form and systematic nature of quantitative datasets, and is consistent with most existing subfield-specific approaches. Although the digital library field includes numerous creative ideas, we limit ourselves to only those elements that appear ready for easy practical use by scientists, journal editors, publishers, librarians, and archivists.