Information Privacy

Altman M, Capps C, Prevost R.
Location Confidentiality and Official Surveys
. Social Science Research Network [Internet]. 2016.
In general, the growth of big data sources have changed the threat landscape of privacy and statistics in at least three major ways. First, when surveys were initially founded as the principal source of statistical information, whether one participated in a survey was largely unknown. Now, as government record systems and corporate big data sources are increasingly used that include all or a large portion of a given universe, that privacy protection is eroded. Second, in the past, little outside information was generally available to match with published summaries. Now the ubiquity of auxiliary information enables many more inferences from summary data. Third, in the past, typical privacy attacks relied on linking outside data through well-known public characteristics -- PII or BII. Now, datasets can be linked through behavioral fingerprints. The current state of the practice in privacy lags well behind the state of the art in this area. Most commercial organizations, and most NSOs in other countries continue to rely (at most) on traditional aggregation and suppression methods to protect privacy – with no formal analysis of privacy loss or of the utility of the information gathered. The U.S. Census Bureau, because of its size, institutional capacity, and strong reputation for privacy protection could establish leadership in modernizing privacy practices.

IAP- Managing Confidential Research Data

Apr 25, 12:00pm to 3:00pm



This class is on research design and research methods related to confidential information. In this class we’ll discuss how to recognize sensitive information; prepare for IRB review; reduce risks in data collection; evaluate information threats and vulnerabilities; organize and store sensitive data;  understand data use agreements; and create data management plans.  If you’re a researcher, whether a late career grad student, faculty, or professional research staff, this class is for you.


Vayena E, Gasser U, Wood A, O'Brien D, Altman M. Elements of a New Ethical and Regulatory Framework for Big Data Research. Washington and Lee Law Review [Internet]. 2016;72(3):420-442. Publisher's VersionAbstract
Vast quantities of data about individuals are increasingly being created by services such as mobile apps and online social networks and through methods such as DNA sequencing. These data are quite rich, containing a large number of fine-grained data points related to human biology, characteristics, behaviors, and relationships over time.
Wood A, Airoldi E, Altman M, de Montandre Y, Gasser U, O'Brien D, Vadhan S. Privacy Tools project response to Common Rule Notice of Proposed Rule Making. Comments on Regulation.Gov [Internet]. 2016. Publisher's VersionAbstract
  This is a Comment on the Department of Health and Human Services (HHS) Proposed Rule: Federal Policy for the Protection of Human Subjects   We recognize the exciting research opportunities enabled by new data sources and technologies for collecting, analyzing, and sharing data about individuals. With the ability to collect and analyze massive quantities of data related to human characteristics, behaviors, and interactions, researchers are increasingly able to explore phenomena in finer detail and with greater confidence. At the same time, a 2 major challenge for realizing the full potential of these recent advances will be protecting the privacy of human subjects. Approaches to privacy protection in common use in both research and industry contexts often provide limited real­world privacy protection. We believe institutional review boards (IRBs) and investigators require new guidance to inform their selection and implementation of appropriate measures for privacy protection in human subjects research. Therefore, we share many of the same concerns and rec

Location Confidentiality and Official Surveys -- Second Census-MIT Big Data Workshop

Nov 30, 12:00pm to Dec 01, 5:00pm


E25-202 MIT, Cambridge, MA

Based on mobile devices alone, commercial entities have the potential to collect extensive, fine grained, continuous, and identifiable records of a persons location and movement history, accompanied with a partial record of other mobile devices (potentially linked to people) encountered over that history. This information is increasingly used for commercial purposes, such as targeted advertising, and for scientific research.

Workshop on IC4TD Principle 8: Address Privacy & Security in Development Programs

May 08, 8:00am to 12:30pm


UN Headquarters, NYC

Dr Altman will present the opening introduction framing Privacy and Security concepts:

In recent years, ICTs such as mobile phone apps, web-based mapping technologies and data-mining techniques have been incorporated into development programmes, ranging from public health campaigns to assisting farming communities, to re-uniting refugees. However, the opportunities presented by these innovations simultaneously raise valid concerns about privacy and data security. 

Registration is available through the UN Global Pulse Site:

NISO Patron Privacy Virtual Meetings

May 06, 10:00am to 1:00pm

Dr Altman will present a short talk on Privacy and Security concepts as part of the NISO Patron Privacy Virtual Meeting series:

This NISO project, supported by the Mellon foundation, involves a series of community discussions on how libraries, publishers, and information systems providers can build better privacy protection into their operations. The project will also support creation of a draft framework to support patron privacy and subsequent publicity of the draft prior to its advancement for approval as a NISO Recommended Practice.

O'Brien D, Ullman J, Altman M, Gasser U, Bar-Sinai M, Nissim K, Vadhan S, Wojcik MJ, Wood A.

When is Information Purely Public?

. Social Science Research Network [Internet]. Working Paper. Publisher's VersionAbstract
Researchers are increasingly obtaining data from social networking websites, publicly-placed sensors, government records and other public sources. Much of this information appears public, at least to first impressions, and it is capable of being used in research for a wide variety of purposes with seemingly minimal legal restrictions. The insights about human behaviors we may gain from research that uses this data are promising. However, members of the research community are questioning the ethics of these practices, and at the heart of the matter are some difficult questions about the boundaries between public and private information. This workshop report, the second in a series, identifies selected questions and explores issues around the meaning of “public” in the context of using data about individuals for research purposes.

Managing Confidential Data (IAP Class)

Oct 29, 9:00am to 1:00pm

This tutorial provides a framework for identifying and managing confidential information in research. It is most appropriate for mid-late career graduate students, faculty, and professional research staff who actively engage in the design/planning of research. The course will provide an overview of the major legal requirements governing confidential research data; and the core technological measures used to safeguard data. And it will provide an introduction to the statistical methods and software tools used to analyze and limit disclosure risks.


Wood A, O'Brien D, Altman M, Karr A, Gasser U, Bar-Sinai M, Nissim K, Ullman J, Vadhan S, Wojcik MJ.

Integrating Approaches to Privacy Across the Research Lifecycle: Long-Term Longitudinal Studies

. Social Science Research Network [Internet]. Working Paper. Publisher's VersionAbstract
On September 24-25, 2013, the Privacy Tools for Sharing Research Data project at Harvard University held a workshop titled "Integrating Approaches to Privacy across the Research Data Lifecycle." Over forty leading experts in computer science, statistics, law, policy, and social science research convened to discuss the state of the art in data privacy research. The resulting conversations centered on the emerging tools and approaches from the participants’ various disciplines and how they should be integrated in the context of real-world use cases that involve the management of confidential research data. This workshop report, the first in a series, provides an overview of the long-term longitudinal study use case. Long-term longitudinal studies collect, at multiple points over a long period of time, highly-specific and often sensitive data describing the health, socioeconomic, or behavioral characteristics of human subjects. The value of such studies lies in part in their ability to link a set of behaviors and changes to each individual, but these factors tend to make the combination of observable characteristics associated with each subject unique and potentially identifiable. Using the research information lifecycle as a framework, this report discusses the defining features of long-term longitudinal studies and the associated challenges for researchers tasked with collecting and analyzing such data while protecting the privacy of human subjects. It also describes the disclosure risks and common legal and technical approaches currently used to manage confidentiality in longitudinal data. Finally, it identifies urgent problems and areas for future research to advance the integration of various methods for preserving confidentiality in research data.