Publications by Year: Working Paper

Working Paper
Nissim K, Steinke T, Wood A, Altman M, Bembenek A, Bun M, Gaboardi M, O'Brien DR, Vadhan S. Differential Privacy: A Primer for a Non-Technical Audiance. Presented at Privacy Law Scholars Conference 2017. Working Paper.Abstract
Differential privacy is a formal mathematical formal mathematical framework for guaranteeing privacy protection when analyzing or releasing statistical data. Recently emerging from the theoretical computer science literature, differential privacy is now in initial stages of implementation and use in various academic, industry, and government settings. This document is a primer on differential privacy. Using intuitive illustrations and limited mathematical formalism, this primer provides an introduction to dierential privacy for non-technical practitioners, who are increasingly tasked with making decisions with respect to dierential privacy as it grows more widespread in use. In particular, the examples in this document illustrate ways in which social science and legal audiences can conceptualize the guarantees provided by differetial privacy with respect to the decisions they make when managing personal data about research subjects and informing them about the privacy protection they will be afforded.
nissim_et_al_-_differential_privacy_primer_for_non-technical_audiences_1.pdf
Altman M, Wood A, O'Brien D, Gasser U. Practical Approaches to Big Data Privacy Over Time, in The Brussels Privacy Symposium: Final Papers. Brussels: Future of Privacy Forum; Working Paper. Publisher's VersionAbstract
Increasingly, governments and businesses are collecting, analyzing, and sharing detailed information about individuals over long periods of time. Vast quantities of data from new sources and novel methods for large-scale data analysis promise to yield deeper understanding of human characteristics, behavior, and relationships and advance the state of science, public policy, and innovation. At the same time, the collection and use of fine-grained personal data over time is associated with significant risks to individuals, groups, and society at large. In this article, we examine a range of longterm data collections, conducted by researchers in social science, in order to identify the characteristics of these programs that drive their unique sets of risks and benefits. We also examine the practices that have been established by social scientists to protect the privacy of data subjects in light of the challenges presented in long-term studies. We argue that many uses of big data, across academic, government, and industry settings, have characteristics similar to those of traditional long-term research studies. In this article, we discuss the lessons that can be learned from longstanding data management practices in research and potentially applied in the context of newly emerging data sources and uses.
altman-et-al.-practical-approaches-to-big-data-privacy-over-time_1.pdf
Alstad Z, Dahlstrom-Hakki I, Asbell-Clarke J, Rowe E, Altman M. The Use of Multidimensional Biopsychological Markers to Detect Learning in Educational Gaming Environments. Working Paper.Abstract
This project explores how multidimensional bio-psychological measures are used to understand the cognitive aspects of student learning in STEM (Science, Technology, Engineering and Math) focused educational games. Furthermore, we seek to articulate a method for how learning events can be automatically analyzed using these tools. Given the complexity and difficulty of finding externalized markers of learning as it happens, it is evident that more robust measures could benefit this process. The work reported here, with funding from National Science Foundation grant (NSF DRL-1417456), aims to incorporate more diverse measures of behavior and physiology in order to create a more complete assessment of learning and cognition in a game based environment. Tools used in this project include eye tracking systems, heart rate sensors, as well as tools for detecting electrodermal activity (EDA), temperature and movement data. Findings indicated both the utility of more varied measures as well as the need for more precise tools for synchronization of diverse data streams.
request.pdf
O'Brien D, Ullman J, Altman M, Gasser U, Bar-Sinai M, Nissim K, Vadhan S, Wojcik MJ, Wood A.

When is Information Purely Public?

. Social Science Research Network [Internet]. Working Paper. Publisher's VersionAbstract
Researchers are increasingly obtaining data from social networking websites, publicly-placed sensors, government records and other public sources. Much of this information appears public, at least to first impressions, and it is capable of being used in research for a wide variety of purposes with seemingly minimal legal restrictions. The insights about human behaviors we may gain from research that uses this data are promising. However, members of the research community are questioning the ethics of these practices, and at the heart of the matter are some difficult questions about the boundaries between public and private information. This workshop report, the second in a series, identifies selected questions and explores issues around the meaning of “public” in the context of using data about individuals for research purposes.
Altman M, Amos B, McDonald MP, Smith D.

Revealing Preferences: Why Gerrymanders are Hard to Prove, and What to Do about It

. Social Science Research Network [Internet]. Working Paper. Publisher's VersionAbstract
Gerrymandering requires illicit intent. We classify six proposed methods to infer the intent of a redistricting authority using a formal framework for causal inferences that encompasses the redistricting process from the release of census data to the adoption of a final plan. We argue all proposed techniques to detect gerrymandering can be classified within this formal framework. Courts have, at one time or another, weighed evidence using one or more of these methods to assess racial or partisan gerrymandering claims. We describe the assumptions underlying each method, raising some heretofore unarticulated critiques revealed by laying bare their assumptions. We then review how these methods were employed in the 2014 Florida district court ruling that the state legislature violated a state constitutional prohibition on partisan gerrymandering, and propose standards that advocacy groups and courts can impose upon redistricting authorities to ensure they are held accountable if they adopt a partisan gerrymander.
Altman M, Amos B, McDonald MP, Smith DA.

Revealing Preferences: Why Gerrymanders are Hard to Prove, and What to Do about It

 

. [Internet]. Working Paper. Download Paper from SSRNAbstract
Gerrymandering requires illicit intent. We classify six proposed methods to infer the intent of a redistricting authority using a formal framework for causal inferences that encompasses the redistricting process from the release of census data to the adoption of a final plan. We argue all proposed techniques to detect gerrymandering can be classified within this formal framework. Courts have, at one time or another, weighed evidence using one or more of these methods to assess racial or partisan gerrymandering claims. We describe the assumptions underlying each method, raising some heretofore unarticulated critiques revealed by laying bare their assumptions. We then review how these methods were employed in the 2014 Florida district court ruling that the state legislature violated a state constitutional prohibition on partisan gerrymandering, and propose standards that advocacy groups and courts can impose upon redistricting authorities to ensure they are held accountable if they adopt a partisan gerrymander.
Altman M, Magar E, McDonald MP, Trelles A. The Effects of Automated Redistricting and Partisan Strategic Interaction on Representation: The Case of Mexico. Social Science Research Network [Internet]. Working Paper. Publisher's VersionAbstract
In the U.S. redistricting is deeply politicized and often synonymous with gerrymandering -- the manipulation of boundaries to promote the goals of parties, incumbents, and racial groups. In contrast, Mexico’s federal redistricting has been implemented nationwide since 1996 through automated algorithms devised by the electoral management body (EMB) in consultation with political parties. In this setting, parties interact strategically and generate counterproposals to the algorithmically generated plans in a closed-door process that is not revealed outside the bureaucracy. Applying geospatial statistics and large-scale optimization to a novel dataset that has never been available outside of the EMB, we analyze the effects of automated redistricting and partisan strategic interaction on representation. Our dataset comprises the entire set of plans generated by the automated algorithm, as well as all the counterproposals made by each political party during the 2013 redistricting process. Additionally, we inspect the 2006 map with new data and two proposals to replace it towards 2015 in search for partisan effects and political distortions. Our analysis offers a unique insight into the internal workings of a purportedly autonomous EMB and the partisan effects of automated redistricting on representation.
Wood A, O'Brien D, Altman M, Karr A, Gasser U, Bar-Sinai M, Nissim K, Ullman J, Vadhan S, Wojcik MJ.

Integrating Approaches to Privacy Across the Research Lifecycle: Long-Term Longitudinal Studies

. Social Science Research Network [Internet]. Working Paper. Publisher's VersionAbstract
On September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data privacy research. The resulting conversations centered on the emerging tools and approaches from the participants’ various disciplines and how they should be integrated in the context of real-world use cases that involve the management of confidential research data. This workshop report, the first in a series, provides an overview of the long-term longitudinal study use case. Long-term longitudinal studies collect, at multiple points over a long period of time, highly-specific and often sensitive data describing the health, socioeconomic, or behavioral characteristics of human subjects. The value of such studies lies in part in their ability to link a set of behaviors and changes to each individual, but these factors tend to make the combination of observable characteristics associated with each subject unique and potentially identifiable. Using the research information lifecycle as a framework, this report discusses the defining features of long-term longitudinal studies and the associated challenges for researchers tasked with collecting and analyzing such data while protecting the privacy of human subjects. It also describes the disclosure risks and common legal and technical approaches currently used to manage confidentiality in longitudinal data. Finally, it identifies urgent problems and areas for future research to advance the integration of various methods for preserving confidentiality in research data.