Data citation is rapidly emerging as a key practice supporting data access, sharing and reuse, as well as sound and reproducible scholarship. Consensus data citation principles, articulated through the Joint Declaration of Data Citation Principles , represent an advance in the state of the practice and a new consensus on citation
Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse.
In support of this assertion, and to encourage good practice, we offer a set of guiding principles for data within scholarly literature, another dataset, or any other research object.
The relatively new practice of making bibliographic references to data sets with formal citations begins to address long-standing problems limiting our collective ability to locate data and to reuse them effectively in advancing science. References made and citations received support a research infrastructure to provide the necessary recognition and reward of data work, in addition to providing attribution detail, facilitating future access, and fostering cross-collaboration and investigation. They are the links between the data and the published research results needed to maintain the integrity of the scientific method. Some research funders have begun to require that publicly funded research data be deposited with various data centers. As these practices become better established, the ability to detect, locate, obtain, and understand the data from prior research will be circumscribed by our ability to have a sufficient description of those data: a citation. Based on a review of emerging practices and analysis of existing literature on citation practices, we have identified the following set of “first principles” for data citation:
Data quality criteria implied by the candidate frameworks are neither easily harmonized nor readily quantified. Thus, a generalized systematic approach to evaluating data quality seems unlikely to emerge soon. Fortunately, developing an effective approach to digital curation that respects data quality does not require a comprehensive definition of data quality. Instead, we can appropriately address “data quality” in curation by limiting our consideration to a narrower applied questions: Which aspects of data quality are (potentially) affected by (each stage of) digital curation activity? And how do we keep invariant data quality properties at each curation stage? A number of approaches suggest seem particularly likely to bear fruit: Incorporate portfolio diversification in selection and appraisal. Support validation of preservation quality attributes such as authenticity, integrity, organization, and chain of custody throughout long-term preservation and use — from ingest through delivery and creation of derivative works. Apply semantic fingerprints for quality evaluation during ingest, format migration and delivery. These approaches have the advantage of being independent of the content subject area, the domain of measure, and the particular semantics content of objects and collections — so they are broadly applicable. By mitigating these broad-spectrum threats to quality, we can improve the overall quality of curated collections and their expected value to target communities.
In political economy, computational models are used to simulate the behavior of institutions or individuals. Researchers use these models to explore emergent patterns in the behavior of individuals and institutions over time. Computational models are used as a complement to mathematical models -and as a form of independent theory construction in their own right.
The National Center for Science and Engineering Statistics (NCSES) of the National Science Foundation (NSF) communicates its science and engineering (S&E) information to data users in a very fluid environment that is undergoing modernization at a pace at which data producer dissemination practices, protocols, and technologies, on one hand, and user demands and capabilities, on the other, are changing faster than the agency has been able to accommodate.
This article discusses an algorithm (called "UNF") for verifying digital data matrices. This algorithm is now used in a number of software packages and digital library projects. We discuss the details of the algorithm, and offer an extension for normalization of time and duration data.
Critical components of the scholarly and library community are use of a common language and universal standards for scholarly citations and credit attribution, to enable the location and retrieval of articles and books. We present a proposal for a similar universal standard for citing quantitative data that retains
the advantages of print citations, adds other components made possible by, and needed due to, the digital form and systematic nature of quantitative datasets, and is consistent with most existing subfield-specific approaches. Although the digital library field includes numerous creative ideas, we limit ourselves to only those elements that appear ready for easy practical use by scientists, journal editors, publishers, librarians, and archivists.