Working Paper
Okulicz-Kozaryn A, Altman M. The Energy Paradox: Energy Use and Happiness. Working Paper.Abstract

It is widely claimed that there is a substantial tradeoff between energy preservation and human wellbeing. We are reluctant to cut energy consumption for fear of decline in our happiness. Despite technological advances, Earth’s per capita energy use continues to grow. The environmental consequences are well known: resource depletion, pollution, and global warming. Here we studied the relationship between energy consumption and happiness across four decades, and multiple levels of geography. Surprisingly, we found that received wisdom is false–for counties, states and nations, energy consumption is neither necessary for wellbeing, nor linked directly to it. The relation between energy use and happiness is very similar to the relation between economic growth and happiness, i.e., the Easterlin Paradox.

Alstad Z, Dahlstrom-Hakki I, Asbell-Clarke J, Rowe E, Altman M. The Use of Multidimensional Biopsychological Markers to Detect Learning in Educational Gaming Environments. Working Paper.Abstract

This project explores how multidimensional bio-psychological measures are used to understand the cognitive aspects of student learning in STEM (Science, Technology, Engineering and Math) focused educational games. Furthermore, we seek to articulate a method for how learning events can be automatically analyzed using these tools. Given the complexity and difficulty of finding externalized markers of learning as it happens, it is evident that more robust measures could benefit this process. The work reported here, with funding from National Science Foundation grant (NSF DRL-1417456), aims to incorporate more diverse measures of behavior and physiology in order to create a more complete assessment of learning and cognition in a game based environment. Tools used in this project include eye tracking systems, heart rate sensors, as well as tools for detecting electrodermal activity (EDA), temperature and movement data. Findings indicated both the utility of more varied measures as well as the need for more precise tools for synchronization of diverse data streams.

O'Brien D, Ullman J, Altman M, Gasser U, Bar-Sinai M, Nissim K, Vadhan S, Wojcik MJ, Wood A. When is Information Purely Public?. Social Science Research Network [Internet]. Working Paper. Publisher's VersionAbstract
Researchers are increasingly obtaining data from social networking websites, publicly-placed sensors, government records and other public sources. Much of this information appears public, at least to first impressions, and it is capable of being used in research for a wide variety of purposes with seemingly minimal legal restrictions. The insights about human behaviors we may gain from research that uses this data are promising. However, members of the research community are questioning the ethics of these practices, and at the heart of the matter are some difficult questions about the boundaries between public and private information. This workshop report, the second in a series, identifies selected questions and explores issues around the meaning of “public” in the context of using data about individuals for research purposes.
Altman M, Amos B, McDonald MP, Smith D. Revealing Preferences: Why Gerrymanders are Hard to Prove, and What to Do about It. Social Science Research Network [Internet]. Working Paper. Publisher's VersionAbstract

Gerrymandering requires illicit intent. We classify six proposed methods to infer the intent of a redistricting authority using a formal framework for causal inferences that encompasses the redistricting process from the release of census data to the adoption of a final plan. We argue all proposed techniques to detect gerrymandering can be classified within this formal framework. Courts have, at one time or another, weighed evidence using one or more of these methods to assess racial or partisan gerrymandering claims. We describe the assumptions underlying each method, raising some heretofore unarticulated critiques revealed by laying bare their assumptions. We then review how these methods were employed in the 2014 Florida district court ruling that the state legislature violated a state constitutional prohibition on partisan gerrymandering, and propose standards that advocacy groups and courts can impose upon redistricting authorities to ensure they are held accountable if they adopt a partisan gerrymander.

Altman M, Magar E, McDonald MP, Trelles A. The Effects of Automated Redistricting and Partisan Strategic Interaction on Representation: The Case of Mexico. Social Science Research Network [Internet]. Working Paper. Publisher's VersionAbstract
In the U.S. redistricting is deeply politicized and often synonymous with gerrymandering -- the manipulation of boundaries to promote the goals of parties, incumbents, and racial groups. In contrast, Mexico’s federal redistricting has been implemented nationwide since 1996 through automated algorithms devised by the electoral management body (EMB) in consultation with political parties. In this setting, parties interact strategically and generate counterproposals to the algorithmically generated plans in a closed-door process that is not revealed outside the bureaucracy. Applying geospatial statistics and large-scale optimization to a novel dataset that has never been available outside of the EMB, we analyze the effects of automated redistricting and partisan strategic interaction on representation. Our dataset comprises the entire set of plans generated by the automated algorithm, as well as all the counterproposals made by each political party during the 2013 redistricting process. Additionally, we inspect the 2006 map with new data and two proposals to replace it towards 2015 in search for partisan effects and political distortions. Our analysis offers a unique insight into the internal workings of a purportedly autonomous EMB and the partisan effects of automated redistricting on representation.
Chassanoff A, Borghi J, AlNoamany Y, Thornton K. Software Curation in Research Libraries: Practice and Promise. he Journal of Librarianship and Scholarly Communication. 2018;Forthcoming.Abstract

INTRODUCTION. Research software plays an increasingly vital role in the scholarly record. Academic research libraries are in the early stages of exploring strategies for curating and preserving research software, aiming to provide long-term access and use. DESCRIPTION OF PROGRAM. In 2016, the Council on Library and Information Resources (CLIR) began offering postdoctoral fellowships in software curation. Four institutions hosted the initial cohort of software curation fellows. This article describes the work activities and research program of the cohort, highlighting the challenges and benefits of doing this exploratory work in research libraries. NEXT STEPS. Academic research libraries are poised to play an important role in research and development around robust services for software curation. The next cohort of CLIR fellows are set to begin in fall 2018 and will likely shape and contribute substantially to an emergent research agenda.

How big data challenges privacy, and how science can help. The Washington DC 100 [Internet]. 2018;May 8. Publisher's VersionAbstract
The collection of personal information has become broader and more threatening than anyone could have imagined. Our research finds traditional approaches to safeguarding privacy are stretched to the limit as thousands of data points are collected about us every day and maintained indefinitely by a host of technology platforms.
Altman M, Cohen A, Fluitt A, Nissim K, Washington M, Wood A. Comments on new techniques and Methodologies for Combining Data From Multiple Source. Office of Management and Budget. 2018.Abstract

Comments in response to  Request for information,

New techniques and methodologies based on combining data from multiple sources

Altman M, Wood A. How big data challenges privacy, and how science can help. Washingto DC 100 [Internet]. 2018;May. Publisher's VersionAbstract

The collection of personal information has become broader and more threatening than anyone could have imagined. Our research finds traditional approaches to safeguarding privacy are stretched to the limit as thousands of data points are collected about us every day and maintained indefinitely by a host of technology platforms.

Altman M, Vayena E, Wood A. A Harm-Reduction Framework for Algorithmic Fairness. IEEE Privacy and Security. 2018;Forthcoming.Abstract

In this article we recognize the profound effects that algorithmic decision-making can have on people’s lives and proposes a harm-reduction framework for algorithmic fairness. We argue that any evaluation of algorithmic fairness must take into account the foreseeable effects that algorithmic design, implementation, and use have on the well-being of individuals. We further demonstrate how counterfactual frameworks for causal inference developed in statistics and computer science can be used as the basis for defining and estimating the foreseeable effects of algorithmic decisions. Finally, we argue that certain patterns of foreseeable harms are unfair. An algorithmic decision is unfair if it imposes predictable harms on sets of individuals that are unconsciously disproportionate to the benefits these same decisions produce elsewhere. Also, an algorithmic decision is unfair when it is regressive, i.e., when members of disadvantaged groups pay a higher cost for the social benefits of that decision.

Hellyar D, Walsh R, Altman M. Improving digital experience through modeling the human experience: The resurgence of ‘Virtual’- (& ‘Augmented’- & ‘Mixed’-) Reality. In: Reconceptualizing Libraries. Routledge Press ; 2018.Abstract

This essay is designed generally to introduce information professionals and researchers to the topic of VR, to characterize its potential to enhance human experiences, and to identify the concepts that are critical to its application. The essay is also intended specifically for professional librarians, and applied library information science researchers, who aim to integrate new interface technologies and design concepts into library systems.

Nissim K, Steinke T, Wood A, Altman M, Bembenek A, Bun M, Gaboardi M, O'Brien DR, Vadhan S. Differential Privacy: A Primer for a Non-Technical Audience. Vanderbilt Journal of Entertainment and Technology Law. 2018;Forthcoming.Abstract

Differential privacy is a formal mathematical formal mathematical framework for guaranteeing privacy protection when analyzing or releasing statistical data. Recently emerging from the theoretical computer science literature, differential privacy is now in initial stages of implementation and use in various academic, industry, and government settings.

This document is a primer on differential privacy. Using intuitive illustrations and limited mathematical formalism, this primer provides an introduction to dierential privacy for non-technical practitioners, who are increasingly tasked with making decisions with respect to dierential privacy as it grows more widespread in use. In particular, the examples in this document illustrate ways in which social science and legal audiences can conceptualize the guarantees provided by differetial privacy with respect to the decisions they make when managing personal data about research subjects and informing them about the privacy protection they will be afforded.

Altman M, Wood A, O'Brien D, Gasser U. Practical Approaches to Big Data Privacy Over Time. International Journal of Data Privacy Law [Internet]. 2018. Earlier versionAbstract

Increasingly, governments and businesses are collecting, analyzing, and sharing detailed information about individuals over long periods of time. Vast quantities of data from new sources and novel methods for large-scale data analysis promise to yield deeper understanding of human characteristics, behavior, and relationships and advance the state of science, public policy, and innovation. At the same time, the collection and use of fine-grained personal data over time is associated with significant risks to individuals, groups, and society at large. In this article, we examine a range of longterm data collections, conducted by researchers in social science, in order to identify the characteristics of these programs that drive their unique sets of risks and benefits. We also examine the practices that have been established by social scientists to protect the privacy of data subjects in light of the challenges presented in long-term studies. We argue that many uses of big data, across academic, government, and industry settings, have characteristics similar to those of traditional long-term research studies. In this article, we discuss the lessons that can be learned from longstanding data management practices in research and potentially applied in the context of newly emerging data sources and uses.

Altman M, McDonald M. Why redistricting should not be left to a mathematical formula alone. LSE US Centre [Internet]. 2017. Publisher's VersionAbstract
new research, Micah Altman and Michael P. McDonald find that there are limitations to such a formula based approach, especially given that here is no consensus on which one is a good measure of representation. Instead, they propose that formulas are used alongside open and transparent systems that support public participation in the redistricting process.
Gallinger M, Bailey J, Cariani K, Owens T, Altman M. Trends in Digital Preservation Capacity and Practice: Results from the 2nd Bi-annual National Digital Stewardship Alliance Storage Survey. D-Lib [Internet]. 2017;23 (7/8). Publisher's VersionAbstract

Research and practice in digital preservation requires a solid foundation of evidence of what is being protected and what practices are being used. The National Digital Stewardship Alliance (NDSA) storage survey provides a rare opportunity to examine the practices of most major US memory institutions. The repeated, longitudinal design of the NDSA storage surveys offer a rare opportunity to more reliably detect trends within and among preservation institutions rather than the typical surveys of digital preservation, which are based on one-time measures and convenience (Internet-based) samples. The survey was conducted in 2011 and in 2013. The results from these surveys have revealed notable trends, including continuity of practice within organizations over time, growth rates of content exceeding predictions, shifts in content availability requirements, and limited adoption of best practices for interval fixity checking and the Trusted Digital Repositories (TDR) checklist. Responses from new memory organizations increased the variety of preservation practice reflected in the survey responses.


Castro E, Crosas M, Garnett A, Sheridan K, Altman M. Evaluating and Promoting Open Data Practices in Open Access Journals. Journal of Scholarly Publishing. 2017;Forthcoming.Abstract

In the last decade there has been a dramatic increase in attention from the scholarly communications and research community to open access (OA) and open data practices. These are potentially related, because journal publication policies and practices both signal disciplinary norms, and provide direct incentives for data sharing and citation. However, there is little research evaluating the data policies of OA journals. In this study, we analyze the state of data policies in open access journals, by employing random sampling of the Directory of Open Access Journals (DOAJ) and Open Journal Systems (OJS) journal directories, and applying a coding framework that integrates both previous studies and emerging taxonomies of data sharing and citation. This study, for the first time, reveals both the low prevalence of data sharing policies and practices in OA journals, which differs from the previous studies of commercial journals’ in specific disciplines.  


Magar E, Trelles A, Altman M, McDonald MP. Components of partisan bias originating from single-member districts in multi-party systems: An application to Mexico. Political Geography [Internet]. 2017;57 (1) :1-12. Publisher's VersionAbstract

We extend the estimation of the components of partisan biasd, undue advantage conferred to some party in the conversion of votes into legislative seats to single-member district systems in the presence of multiple parties. Extant methods to estimate the contributions to partisan bias from malapportionment, boundary delimitations, and turnout are limited to two-party competition. In order to assess the spatial dimension of multi-party elections, we propose an empirical procedure combining three existing approaches: a separation method (Grofman et al. 1997), a multi-party estimation method (King 1990), and Monte Carlo simulations of national elections (Linzer, 2012). We apply the proposed method to the study of recent national lower chamber elections in Mexico. Analysis uncovers systematic turnout-based bias in favor of the former hegemonic ruling party that has been offset by district geography substantively helping one or both other major parties.

Altman M, McDonald MP. Redistricting by Formula: An Ohio Reform Experiment. American Politics Research [Internet]. 2017;Forthcoming. Publisher's VersionAbstract

We analyze sixty-six Ohio congressional plans produced during the post-2010 census redistricting by the legislature and the public. The public drew many plans submitted for judging in a competition hosted by reform advocates, who awarded a prize to the plan that scored best on a formula composed of four permissive components: compactness, respect for local political boundaries, partisan fairness, and competition. We evaluate how the legislature’s adopted plan compares to these plans on the advocates’ criteria and our alternative set of criteria, which reveals the degree by which the legislature placed partisanship over these other criteria. Our evaluation reveals minimal trade-offs among the components of the overall competition’s scoring criteria, but we caution that the scoring formula may be sensitive to implementation choices among its components. Compared to the legislature’s plan, the reform community can get more of the four criteria they value; importantly, without sacrificing the state’s only African-American opportunity congressional district.

Digital Preservation Metadata Practice for Disk Image Access
Chassanoff A, Woods K, Lee C. Digital Preservation Metadata Practice for Disk Image Access. In: Implementing PREMIS. Berlin: Springer International Publishing ; 2016. pp. 99-109. Publisher's VersionAbstract


Many libraries, archives, and museums are now regularly acquiring, processing, and analyzing born-digital materials. Materials exist on a variety of source media, including flash drives, hard drives, floppy disks, and optical media. Extracting disk images (i.e., sector-by-sector copies of digital media) is an increasingly common practice. It can be essential to ensuring provenance, original order, and chain of custody. Disk images allow users to explore and interact with the original data without risk of permanent alteration. These replicas help institutions to safeguard against modifications to underlying data that can occur when a file system contained on a storage medium is mounted, or a bootable medium is powered up. Retention of disk images can substantially reduce preservation risks. Digital storage media become progressively difficult (or impossible) to read over time, due to “bit rot,” obsolescence of media, and reduced availability of devices to read them. Simply copying the allocated files off a disk and discarding the storage carrier, however, can be problematic. The ability to access and render the content of files can depend upon the presence of other data that resided on the disk. These dependencies are often not obvious upon first inspection and may only be discovered after the original medium is no longer readable or available. Disk images also enable a wide range of potential access approaches, including dynamic browsing of disk images (Misra S, Lee CA, Woods K (2014) A Web Service for File-Level Access to Disk Images. Code4Lib Journal, 25 [3]) and emulation of earlier computing platforms. Disk images often contain residual data, which may consist of previously hidden or deleted files (Redwine G, et al. in Born digital: guidance for donors, dealers, and archival repositories. Council on Library and Information Resources, Washington, 2013 [4]). Residual data can be valuable for scholars interested in learning about the context of creation. Traces of activities undertaken in the original environment—for example, identifying removable media connected to a host machine or finding contents of browser caches—can provide additional sources of information for researchers and facilitate the preservation of materials (Woods K, et al. in Proceedings of the 11th annual international ACM/IEEE joint conference on digital libraries, pp. 57–66, 2011 [5]). Digital forensic tools can be used to create disk images in a wide range of formats. These include raw files (such as those produced by the Unix tool dd). Quantifying successes and failures for many tools can require judgment calls by qualified digital curation professionals. Verifying a checksum for a file is a simple case; the checksums either match or are different. In the events described in the previous sections, however, the conditions for success are fuzzier. For example, fiwalk will often “successfully” complete whether or not it is able to extract a meaningful record of the contents of file system(s) on a disk image. Likewise, bulk_extractor will simply report items of interest it has discovered. Knowing whether this output is useful (and whether it has changed between separate executions of a given tool) depends on comparison of the output between the two runs, information not currently recorded in the PREMIS document. In the BitCurator implementation, events are often recorded as having completed, rather than as having succeeded, to avoid ambiguity. Future iterations of the implementation may include more nuanced descriptions of event outcomes.