What’s new in managing confidential research data this year?
For MIT’s independent activities periods (IAP) the Program on Information Science regularly leads a practical workshop on managing confidential data. This is in part a result of research through the Privacy Tools project. As I was updating the workshop for this semester, I had an opportunity to reflect upon what’s new on the pragmatic side of managing confidential information.
Most notably, because of the publicity surrounding the NSA, more people (and in higher places) are paying attention. (And as an information scientist I note that one benefit of the NSA scandal is that everyone now recognizes the term “metadata”).
Also, generally, personal information continues to become more available and increasingly easy to link information to individuals. New laws, regulations and policies governing information privacy continue to emerge, increasing the complexity of management. . Trends in information collection and management — cloud storage, “big” data, and debates about the right to limit access to published but personal information complicate data management, and make traditional approaches to managing confidential data decreasingly effective.
- On the do-it-yourself front. The increasing flexibility of the FISMA-certified Amazon Web Services GovCloud makes running a remote, secure research computing environment easier and more economical. Although this still complex and expensive to maintain, and one still has to trust Amazon — although the FISMA certifications make that trust better justified.
- The second widely used option — combining file-sharing services like DropBox with encrypted filesystems like TrueCrypt also received a boost this year, with the success of a crowdfunded effort to independently audit the TrueCrypt source. This is good news, and the transparency and verifiability of TrueCrypt is its big strength. The approach remains limited in practice to secure publishing of information — it doesn’t support simultaneous remote updates (not unless you like filesystem corruption); multiple keys for different users or portions of the filesystem; key distribution — etc.
- A number of simpler solutions have emerged this year.
– Bittorrent Sync provides “secure” P2P replication and sharing based on a secret private key.
– SpiderOak Hive; Sync.com; and BoxCryptor all offer zero-knowledge cloud-storage, client-side encrypted data sharing. The ease of use and functionality of these systems for secure collaboration is very attractive compared to the other available solutions. BoxCryptor offers an especially wide a range of enterprise features such as key distribution, revocation, master and group-key-chaining, and other enterprise features, that would make managing sharing among heterogenous groups easier. However, the big downside is the amount of “magic” in these systems. None are open source, nor are any sufficiently well documented (at least externally) or certified (no FISMA, there) to engender trust among us untrusting folk… ( Although SpiderOak in particular seems to have a good reputation for trustworthiness… and the others no doubt have pure hearts, I’d rest easier with the ability to audit source codes, peer-reviewed algorithms, etc.)
For those interested in the meat of the course, which gives an overview of legal, policy, information technology/security, research design, and statistical pragmatics, the new slides are here: