On September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data privacy research. The resulting conversations centered on the emerging tools and approaches from the participants’ various disciplines and how they should be integrated in the context of real-world use cases that involve the management of confidential research data. This workshop report, the first in a series, provides an overview of the long-term longitudinal study use case. Long-term longitudinal studies collect, at multiple points over a long period of time, highly-specific and often sensitive data describing the health, socioeconomic, or behavioral characteristics of human subjects. The value of such studies lies in part in their ability to link a set of behaviors and changes to each individual, but these factors tend to make the combination of observable characteristics associated with each subject unique and potentially identifiable. Using the research information lifecycle as a framework, this report discusses the defining features of long-term longitudinal studies and the associated challenges for researchers tasked with collecting and analyzing such data while protecting the privacy of human subjects. It also describes the disclosure risks and common legal and technical approaches currently used to manage confidentiality in longitudinal data. Finally, it identifies urgent problems and areas for future research to advance the integration of various methods for preserving confidentiality in research data.
The reforms to the redistricting process in Florida, catalyzed by advances in information technology, enabled a dramatic increase in public participation in the redistricting process. This reform process in Florida can be considered a partial success: The adopted plan implements one the the most efficient observable trade-offs among the reformer’s criteria, primarily along the lines of racial representation by creating an additional Black-majority district in the form of the current 5th Congressional District. This does not mean, however, that reform was entirely successful. The adopted plan is efficient, but is atypical of the plans submitted by the legislature and public. Based on the pattern of public submissions, and on contextual information, we suspect the adopted plan was drawn for partisan motivations. The public preference and good-government criteria might be better served by the selection of the other efficient plans – that were much more competitive, and less biased, at the cost of a reduction of the majority-minority seat.
The NDSA National Agenda for Digital Stewardship integrates the perspective of dozens of experts and hundreds of institutions, convened through the Library of Congress, to provide funders and executive decision‐makers insight into emerging technological trends, gaps in digital stewardship capacity, and key areas for funding, research and development to ensure that today's valuable digital content remains accessible and comprehensible in the future, supporting a thriving economy, a robust democracy, and a rich cultural heritage.
This new edition of the Agenda builds on earlier work, updating the 2014 report, and highlighting new areas of focus, specifically the selection and preservation of content at-scale. It also more clearly articulates the need for an evidence base for efficient and reliable digital preservation practice. Recent gains and observations on the technical infrastructure required for large-scale digital stewardship and the supporting policies and organizational structures required are also outlined. The report synthesizes the latest issues for funders, researcher and organizational leaders and provides actionable recommendations for practitioners.
Our work leads us to conclude that no one can have complete information and no single group can, on its own, create fair electoral maps. Legislative gerrymandering is not the answer, but as Americans turn toward independent commissions, why not deploy all technologies available to facilitate the widest possible participation in districting choices critical to American democracy?
Data citation is rapidly emerging as a key practice in support of data access, sharing, reuse, and of sound and reproducible scholarship. In this article we review the evolution of data citation standards and practices – to which Sue Dodd was an early contributor – and the core principles of data citation that have emerged through a collaborative synthesis. We then discuss an example of the current state of the practice, and identify the remaining implementation challenges.
Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse.
In support of this assertion, and to encourage good practice, we offer a set of guiding principles for data within scholarly literature, another dataset, or any other research object.
Recent technological advances have enabled greater public participation and transparency in the United States redistricting process. We review these advances, with particular attention to activities involving open-source redistricting software.
The relatively new practice of making bibliographic references to data sets with formal citations begins to address long-standing problems limiting our collective ability to locate data and to reuse them effectively in advancing science. References made and citations received support a research infrastructure to provide the necessary recognition and reward of data work, in addition to providing attribution detail, facilitating future access, and fostering cross-collaboration and investigation. They are the links between the data and the published research results needed to maintain the integrity of the scientific method. Some research funders have begun to require that publicly funded research data be deposited with various data centers. As these practices become better established, the ability to detect, locate, obtain, and understand the data from prior research will be circumscribed by our ability to have a sufficient description of those data: a citation. Based on a review of emerging practices and analysis of existing literature on citation practices, we have identified the following set of “first principles” for data citation:
The 2014 National Agenda for Digital Stewardship highlights emerging technological trends, identifies gaps in digital stewardship capacity, and provides funders and decision‐makers with insight into the work needed to ensure that today's valuable digital content remains accessible, useful and comprehensible in the future, supporting a thriving economy, a robust democracy, and a rich cultural heritage. It is meant to inform, rather than replace, individual organizational efforts, planning, goals, or opinions. It offers inspiration and guidance and suggests potential directions and areas of inquiry for research and future work in digital stewardship.
The structure and design of digital storage systems is a cornerstone of digital preservation. To better understand ongoing storage practices of organizations committed to digital preservation, the National Digital Stewardship Alliance conducted a survey of member organizations. This article reports on the findings of the survey. The results of the survey provide a frame of reference for organizations to compare their storage system approaches with NDSA member organizations.
Over the past fifty years, the battle lines in Virginia redistricting have shifted from within-party fighting among Democrats primarily over malapportionment favoring rural interests over urban interests to battles over voting rights. In this article, we provide a detailed history of redistricting in Virginia, and a quantitative analysis of current adopted and proposed redistricting plans. Surprisingly, although the outcome remained partisan, the current round of redistricting included an unprecedented level of public engagement, catalyzed by information technology. The Virginia commission and the participation of students in the current round of Virginia’s redistricting demonstrates that redistricting does not have to be left up to the ‘professionals.’ Further, our analysis suggests that state-level reform in the form of an independent commission that strictly follows a set of administrative criteria would likely modestly benefit Republicans.
Participative technology has succeeded beyond our expectations. The number of legally viable, publicly submitted plans has grown by a factor of a hundred since the last decade. These plans demonstrate a qualitative difference in public participation and have produced many examples of better ways of redistricting.
Baker v. Carr’s elevation of new population equality criteria for redistricting over old geographic-based criteria reflected an evolution in how the courts and society understood the principles of representation. Twenty-first century principles of redistricting should reflect modern understandings of representation and good government—and also reflect the new opportunities and constraints made possible through advancing technology and data collection.
Gerrymandering is a form of political boundary delimitation, or redistricting, in which the boundaries
are selected to produce an outcome that is improperly favorable to some group. The name “gerrymander” was first used by the Boston Gazette in 1812 to describe the shape of Massachusetts Governor Elbridge Gerry’s redistricting plan, in which one district was said to have resembled a salamander. In the United States, congressional and legislative redistricting occurs every 10 years, following the decennial census. The aim of redistricting is to assign voters to equipopulous geographical districts from which they will elect representatives, in order to reflect communities of interest and to improve representation.
Data quality criteria implied by the candidate frameworks are neither easily harmonized nor readily quantified. Thus, a generalized systematic approach to evaluating data quality seems unlikely to emerge soon. Fortunately, developing an effective approach to digital curation that respects data quality does not require a comprehensive definition of data quality. Instead, we can appropriately address “data quality” in curation by limiting our consideration to a narrower applied questions: Which aspects of data quality are (potentially) affected by (each stage of) digital curation activity? And how do we keep invariant data quality properties at each curation stage? A number of approaches suggest seem particularly likely to bear fruit: Incorporate portfolio diversification in selection and appraisal. Support validation of preservation quality attributes such as authenticity, integrity, organization, and chain of custody throughout long-term preservation and use — from ingest through delivery and creation of derivative works. Apply semantic fingerprints for quality evaluation during ingest, format migration and delivery. These approaches have the advantage of being independent of the content subject area, the domain of measure, and the particular semantics content of objects and collections — so they are broadly applicable. By mitigating these broad-spectrum threats to quality, we can improve the overall quality of curated collections and their expected value to target communities.