Jun 21, 1:01pm

Can Computers be Feminist? — Comments on a Brown Bag talk by Gillian Smith

Gillian Smith who is an Assistant Professor in Art+Design and Computer Science at Northeastern University, gave this talk entitled Can Computers be Feminist? Procedural Politics and Computational Creativity, as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated through the slides below, Gillian presented a perspective on computing viewed as a co-creation of developer, algorithm, data,  and user. And the talk argued that developers embed, through the selection of algorithm and data, specific ethical and epistemic commitments within the software they produce.  

In her abstract, Gillian summarizes as follows:

Computers are increasingly taking on the role of a creator—making content for games, participating on twitter, generating paintings and sculptures. These computationally creative systems embody formal models of both the product they are creating and the process they follow. Like that of their human counterparts, the work of algorithmic artists is open to criticism and interpretation, but such analysis requires a framework for discussing the politics embedded in procedural systems. In this talk, I will examine the politics that are (typically implicitly) represented in computational models for creativity, and discuss the possibility for incorporating feminist perspectives into their underlying algorithmic design.

There were a wide range of provocative games, tools, and concepts referenced in the talk that were particularly intriguing, including:

  • A strategy game called ThreadSteading, developed by Smith and collaborators at Disney Research Pittsburgh, that is played with cloth tiles, and which, at the end of the game is mechanically sewn into a quilt — transforming strategy (gameplay) to information (the trace of the play, as reflected by the final state of the board) to art.
  • A work of computational art, developed by Smith, mapping from one space to another — from color -> emotion -> shape.
  • Alice & Kev — a model of a homeless, abusive family created within the Sims, which highlights both what behaviors can emerge from the model of “life” embedded in the sims, and what that model clearly elides.  
  • Instances of creative software such as TinyGallery and DeepForger, and software frameworks such as Tracery which facilitate generating creative content that blends algorithmic and human choices.

The full range of topics covered are impossible to summarize in a concise summary — I recommend readers follow the links and references in the slides.

The talk raised a number of themes: How software must be understood as a complex interaction among information (data), structure (software), and behavior (use); how games and software embed epistemic and ethical models of the player/user, context, and society; how authorship and labor entwines complex relationships among authors and pay, claim, credit, and work.

Dr Smith’s talk also raised a number of provocative questions (which I paraphrase): How can software support richer identity models incorporating a broad spectrum of gender and sexuality? How can diversity in software authorship be achieved? How do we surface and evaluate the biases implicit in software? And what mechanisms beyond content filtering can we use to mitigate these biases? How do we assign responsibility for software, algorithms, and the resulting outputs? How can we integrate empathy into software, algorithms, and data systems?

Smith’s talk claims that one of the societal goals that art serves is to transmit core values through creating emotion. And I have heard said, and believe, that part of the power of art is its ability to engage us in and communicate to us the true emotional complexity of life.

Although art and information science have different goals, both act as both a mirror and lens to the ethical values and epistemic commitments of the cultures and institutions within they are embedded.  Broadly conceived, Smith’s provocations apply also to the development of library software, collections and services.  The research the Program on Information Science has connected with these questions at a number of points, and we aspire to engage more generally in the future by furthering our field’s understanding of how library systems can reflect and support diverse perspectives; can shed light on the biases embedded in information systems, services, and collections; and can incorporate within them understandings of emotion, embodiment, and identity.

May 21, 4:56pm

Science photographer Felice Frankel who is a research scientist in the Center for Materials Science and Engineering at the Massachusetts Institute of Technology, gave this talk on The Visual Component: More Than Pretty Pictures, as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated through the slides below, Felice made the argument that images and figures are first class intellectual objects — and should be considered just as important as words in publication, learning, and thinking.

In her abstract, Felice summarizes as follows:

Visual representation of all kinds are becoming more important in our ever growing image-based society, especially in science and technology.  Yet there has been little emphasis on developing standards in creating or critiquing those representations.  We must begin to consider images as more than tangential components of information and find ways to seamlessly search for accurate and honest depictions of complex scientific phenomena.  I will discuss a few ideas to that end and show my own process of making visual representations in sciences and engineering.  I will also make the case that representations are just as “intellectual” as text.

The talk presented many visual representations from a huge variety of scientific domains and projects. Across these projects, the talk returned to a number of key themes.

  • When you develop a visual representations it is vital for you to identify the overall purpose of the graphic (for example, whether it is explanatory or exploratory); the key ideas that the representation should communicate about the science; and the context in which the representation will be viewed.
  • There are a number of components of visual design that are universal across subject domain, including: composition, abstraction, coloring, and layering. And small, incremental refinements in the representation can dramatically improve the quality of the representation.
  • The process of developing visual representations engages both students and researchers in critical thinking about science; and this process can be used as a mechanism for research collaboration.
  • Representations are not the science, they are communications of the science; and all representations involve design and manipulation. Maintaining scientific integrity requires transparency about what is included, what is excluded, and what manipulations were used in preparing the representation.


In my observation, information visualization is becoming increasingly popular, and tools for creating visualizations are increasingly accessible to a broad set of contributors. Universities would benefit from supporting students and faculty in visual design for research, publication, and teaching; and in supporting the discovery and curation of collections of representations.

Library engagement in this area is nascent, and there are many possible routes for engagement. Library support for scientific representations is often limited — especially compared to the support for pdf documents or bibliographic citations. I speculate that there are at least five productive avenues for involvement.

  1. Libraries could provide support for researchers in curating personal collections of representations; in sharing them for collaboration; and in publishing them as part of research and educational content. Further researchers have increasing opportunities to cycle between physical and virtual representations of information, thus support for curating information representations can dovetail with library support for making and makerspaces.
  2. Library information systems seldom incorporate information visualization effectively in support of resource discovery and navigation. New information and visualization technologies and methods offer increased opportunities to make library more accessible and more engaging.
  3. Image-based searching is another area that demonstrates that search is not a solved problem. Image-based search provides a powerful means of discovering content that is almost completely absent from current library information systems.
  4. Visual design and communication skills are seldom explicitly documented or transmitted in the academy. Libraries have a vital role to play in making accessible the body of hidden (“tacit”) knowledge and skills that are critical for success in developing careers.
  5. Libraries have a role in helping researchers to engage in evolving systems of credit and attribution. For example, the CredIT taxonomy (which we helped to develop, and which is being adopted by scholarly journals such as Cell and PLOS) provides a way to formally record attribution for those who contribute scientific visualizations.


Apr 26, 12:05pm

Diana Hellyar  who is a Graduate Research Intern in the program, reflects on her investigations into augmented reality, virtual reality, and related technologies

Libraries Can Use New Visualization Technology to Engage Readers

My research as a Research Intern for the MIT Libraries Program on Information Science is focused on the applications of emerging virtual reality and visualization technology to library information discovery. The area of virtual reality and other visualization technology is a rapidly changing field. Staying on top of these technologies and applying them into libraries can be difficult since there is little research on the topic. While I was researching the uses of virtual reality in libraries, I came across an example of how some libraries were able to incorporate augmented reality into their children’s department. Out of a dozen examples, this one caught my attention for many reasons. This example is not just a prototype; it was being used in multiple libraries. It was also easily adopted by non-technical librarians and was easy enough to be used by children.

The mythical maze app (available here) has been downloaded more than 10,000 times to date. Across the United Kingdom children participated in the Reading Agency’s 2014 Summer Reading Challenge, Mythical Maze, by downloading the Mythical Maze app on their mobile devices. Liz McGettigan discusses the app in an article published on the Charter Institute of Library and Information Professions website by explaining how it uses augmented reality to make posters and legend cards around the library come to life. The article links to The Reading Agency’s promotional video (watch it here). The video discusses how mythical creatures are hidden around the library and how children can look for these mythical creatures with their app. If they find the creatures, they can use the app to unlock mini-games. The app also allows children to scan stickers they receive from reading books, which unlocks rewards and allows children to learn more about the mythical creatures.

Using apps and integrating augmented reality is a fun way to do a summer reading challenge. The Reading Agency reported that 2014 was record-breaking year for their program. They state that participation increased by 3.6% and that 81,908 children joined the library to participate in the program, up 22.7% from the previous year. These statistics show that children are responding positively to augmented reality in their libraries.

I think that the best part about this app is that it allows the children’s room to come alive. Children can interact with the library in a way they never have been before.  Encouraging children to use their devices in the library in a fun and educational way is groundbreaking. They may never have been allowed to play with and learn from their devices at the library before.

The article about the summer reading challenge also discussed the idea of “transliteracy”. The author, Liz McGettigan, says that transliteracy is defined as the “ability to read, write and interact across a range of platforms and tools”. It’s important to encourage children to learn how to use their devices to find the information they are looking for. Encouraging children to use their devices for the summer reading challenge helps them to learn how to do this.

What can libraries do with this? I think that libraries can learn from this example and not just for a summer reading program. The librarians can create scavenger hunts for kids that are either for fun or to help them learn about the library and its services. Children can collect prizes for the things they find in the library using the app. Librarians can even use it to have kids react to and rate the books they read. An app can be designed so that if a child hovers their device over a book they can see other children’s ratings and comments about the book. They can do any of these things and more to create new excitement for their library.

One way for this to work would be if publishers teamed up with libraries to create content for similar apps. Then, there would be many more possibilities for interactive content without worrying about copyright issues. Libraries could create a small section of books that would be able to interact with the app. Then, with the device hovered over a book, the story comes to life and is read to them.

There are so many possibilities for teaching, learning, and reading  while using augmented reality in children’s departments of libraries. The Mythical Maze summer reading program is hopefully only the beginning in terms of using this technology to engage children. With the success of the summer reading challenge, I hope other libraries will consider including it in their programming. Using this technology will only enhance learning and will create fun new ways to get children excited about reading.

This example illustrates the possibility of using augmented reality to engage in new visualization technologies. Many types of libraries can implement this technology and allow their users to interact with physical materials in a way they never have before.

Additional Resources:


Apr 21, 7:23pm

Lucy Taylor,  who is a Graduate Research in the program, reflects on software curation at the recent LibrePlanet Conference:

LibrePlanet 2016, Software Curation and Preservation

This year’s LibrePlanet conference, organized by the Free Software Foundation, touched on a number of themes that relate to research on software curation and preservation taking place at MIT’s Program on Information Science.

The two day conference hosted at MIT aimed to “examine how free software creates the opportunity of a new path for its users, allows developers to fight the restrictions of a system dominated by proprietary software by creating free replacements, and is the foundation of a philosophy of freedom, sharing, and change.” In a similar way, at the MIT program on Information Science, we are investigating the ways in which sustainable software might positively impact academic communities and shape future scholarly research practices. This was a great opportunity to compare and contrast the concerns and goals of the Free Software movement with those who use software in research.

A number of recurring themes emerged over the course of the weekend that could inform research on software curation. The event kicked off with a conversation between Edward Snowden and Daniel Kahn Gillimor. They tackled privacy and security, and spoke at length about how current digital infrastructures limit our freedoms. Interestingly, they also touched on how to expand the Free Software community and raise awareness with non technical folks about the need to create, and use, Free Software. A lack of incentives for “newbies” inhibits the growth of the Free Software movement; Free Software needs to compete with proprietary software’s low entry levels and user experience. Similarly, the growth of sustainable, reusable, academic software through better documentation, storage, and visibility is inhibited by a lack of incentives for researchers and libraries to improve software development practices and create curation services.

The talks “Copyleft for the next decade: a comprehensive plan” by Bradley Kuhn and “Will there be a next great Copyright Act?” by Peter Higgins both examined the ways in which licensing and copyright are impacting the Free Software movement. The future seems somewhat bleak for GPL licensing and copyleft  with developers being discouraged from using this license, and instead putting their work under more permissive licenses which then allow companies to use and profit from other’s software. In comparison, research gateways like NanoHub and HubZero encounter the same difficulties in encouraging researchers to make their software freely available to others to use and modify. As both speakers touched on, the general lack of understanding, and also fear, surrounding copyright needs to be remedied. Scihub was also mentioned as an example of a tool that, whilst breaking copyright law, is also revolutionary in nature in that no library has ever aggregated more scientific literature on one platform. How can we create technologies that make scholarly communication more open in the future? Will the curation of software contribute to these aims? Within wider discussions on open access, it is also worthwhile to think about how software can often be a research object in its own right that merits the same curation and concern as journal papers and datasets.

The ideas discussed in the session “Getting the academy to support free software and open science” had many parallels to the research being carried out here at the MIT Program on Information Science. The three speakers spoke about Free Software activities within their home institutions and the barriers that are created by the heavy use of proprietary software at universities. Not only does the continued use of this software result in high costs and the perpetuation of the “centralized web” that relies on companies like Google, Microsoft, and Apple, but this also encourages students to think passively about the technologies they use. Instead, how can we encourage students to think of software as something they can build on and modify through the use of Free Software? Can we develop more engaged academic communities who think and use software critically through the development of software curation services and sustainable software practices? This was a really interesting discussion that explored problematic infrastructures in higher education.

Finally, Alison Macrina and Nima Fatemi’s talk on the “Library Freedom Project: the long overdue partnership between libraries and free software” put the library front and centre in the role of engaging the wider community in Free Software and advocating for better privacy and more freedom. The Library Freedom Project not only educates librarians and patrons on internet privacy but has also rolled out Tor browsers in a few public libraries. What can academic libraries do to build on this important work and to increase awareness about online freedom within our communities?

The conference was a great way to gain insight into the wider activities of the software community and to talk with others from a multitude of different disciplines. It was interesting to think about how research on software curation services could be informed by these broader discussions on the future of Free Software. Academic librarians should also think about how they can advocate for Free Software in their institutions to encourage better understanding of privacy and to foster environments in which software is critically evaluated to meet user needs. Can libraries embrace the Free Software movement as they have the Open Access movement?

Mar 18, 8:24am

Ophir Frieder, who holds the Robert L. McDevitt, K.S.G., K.C.H.S. and Catherine H. McDevitt L.C.H.S. Chair in Computer Science and Information Processing at Georgetown University and is Professor of Biostatistics, Bioinformatics, and Biomathematics at the Georgetown University Medical Center,  gave this talk on  Searching in Harsh Environments as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, Ophir  rebuts the myth that “Google has solved search”, and discusses the challenges of searching for complex objects, through hidden collections, and in harsh environments

In his abstract, Ophir summarizes as follows:

Many consider “searching” a solved problem, and for digital text processing, this belief is factually based.  The problem is that many “real world” search applications involve “complex documents”, and such applications are far from solved.  Complex documents, or less formally, “real world documents”, comprise of a mixture of images, text, signatures, tables, etc., and are often available only in scanned hardcopy formats.   Some of these documents are corrupted.  Some of these documents, particularly of historical nature, contain multiple languages.  Accurate search systems for such document collections are currently unavailable.

The talk discussed three projects. The first project involved developing methods to search collections of complex digitized documents which varied in format, length, genre, and digitization quality; contained diverse fonts, graphical elements, and handwritten annotations; and were subject to errors due to document deterioration and from the digitization process. A second project involved developing methods to enable searchers who arrive with sparse, fragmentary, error-ridden clues  about places and people to successfully find relevant  connected  information in the Archives Section of the United States Holocaust Memorial Museum. A third project involved monitoring Twitter for public health events without relying on a prespecified hypothesis.

Across these projects, Frieder raised a number of themes:

  • Searching on complex objects is very different from searching the web. Substantial portions of complex objects are invisible to current search. And current search engines do understand the semantics of relationships within and among objects — making the right answers hard to find.
  • Searching across most online content now depends on proprietary algorithms, indices, and logs.
  • Researchers need to be able to search collections of content that may never be made available publicly online by Google or other companies.

Despite the increasing amount of born digital material, I speculate that these issues will become more salient to research, and that libraries have a role to play in addressing them.

While much of the “scholarly record” is currently being produced in the form of “pdf”s, which are amenable to the Google searching approach, much web-based content is dynamically generated and customized, and scholarly publications are increasingly incorporating dynamic and interactive features. Searching these will effectively will require engaging with scientific output as complex objects

Further, some areas of science, such as the social sciences, increasingly rely on proprietary collections of big data from commercial sources. Much of this growing evidence base is currently accessible only through proprietary API’s. To meet the heightened requirements for transparency and reproducibility, stewards are needed for these data who can ensure nondiscriminatory long-term research access.

More generally, it is increasingly well recognized that the evidence base of science not only includes published articles, community datasets (and benchmarks); but also may extends to scientific software, replication data, workflows, and even electronic lab notebooks. The article produced at the end is simply a summary description of one pathway the evidence reflected in theses scientific objects. Validating, reproducing, and building on science may increasingly require access to, search over, and understanding of this entire complex set.  

Mar 04, 4:17pm

Julia Flanders, who is the Director of the Digital Scholarship Group in the Northeastern University Library, and a Professor of Practice in Northeastern’s English Department gave a talk on  Jobs, Roles, Skills, Tools: Working in the Digital Academy as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, Julia  discusses the evolving landscape of digital humanities (and digital scholarship more broadly) and considers the relationship between technology, tool development, and professional roles.

In her abstract, Julia summarizes as follows:

Twenty-five years ago, jobs in humanities computing were largely interstitial: located in fortuitous, anomalous corners and annexes where specific people with idiosyncratic skill profiles happened to find a niche. One couldn’t train for such jobs, let alone locate them in a market. The emergence of the field of “digital humanities” since that time may appear to be a disciplinary and methodological phenomenon, but it also has to do with labor: with establishing a new set of jobs for which people can be trained and hired, and which define the contours of the work we define as “scholarship.”

In the research described in her talk Julia identifies seven different roles involved in digital humanities scholarship: developer, administrator, manager, scholar, analyst, data creator, and information manager. She then describes the various skills and metaknowledge required for each and how these roles interact.

(I will note here that the libraries and press have conducted complementary research and engaged in standardization around describing contributorship roles. For more information on this see the Project CREDIT site.)

The talk notes the tensions that develop when these roles are out of balance in a project, and particularly the need for balance among scholar, developer, and anlayst roles. Her talk notes that a combination of scholar, developer, and analyst in a single person is very productive but rare. More typically, early career researchers start as data creators/coders, learn a particular tool set, and evolve into scholars. In the absence of a strong analyst role this creates “a peculiar relationship with tools: a kind of distance (on the scholar’s part) and on the other hand an intensive proximity (on the coder’s part) that may not yet have critical distance or meta-knowledge: the awareness needed to use the tools in a fully knowing way.”

  Observing commercial and research software development projects over thirty years — one of the most common causes of catastrophic failure is the gap between the developer’s understanding of the problem being solved and the customer’s understanding of the same problem. A good analyst (often holding a “product manager” title in the corporate world) has the skills to understand both the business and technical domains sufficiently to probe for these misunderstandings and ensure that discussion converges to a common understanding. In addition the analyst aids in abstracting both the technical and domain problems so that the eventual software solution not only meets the needs of the small number of customers in the loop, but is broad enough for a target community.  Moreover, librarians often have knowledge in components of the technical domain and in the subject domain — which can serve libraries with particular competitive advantage in developing people in these critical bridge roles.

Feb 25, 2:54pm

Chaoqun Ni,  who is an Assistant Professor in the School of Library Science at Simmons, presented a talk  on  Transformative Interactions in the Scientific Workplace as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, Chaoqun uses bibliometric data to  analyze the sociality, equality, and dynamicity of the scientific workforce.

In her abstract, Chaoqun describes her argument as follows:

I argue that, for a country to be scientifically competitive, it must maximize its human intellectual capital-base and support this workforce equitably and efficiently. I propose here a large-scale and heterogeneous analysis of the sociality, equality, and dynamicity of the scientific workforce through novel computational models for understanding and predicting the career trajectory of scientists based on their transformative interactions, gender, and levels of funding. This analysis will be able to isolate factors that contribute to the health and well-being of the scientific workforce. The computational models will quantify the impact of those transformative events and interactions and provide models to predict the career trajectory of scientists based on their gender, the size and position of the social network, and other demographic factors.

According to the talk,  there are three types of events that are particularly likely to transform scholarly careers: being mentored, publishing, and receiving grants. Of these, mentoring occurs earliest in a scholar’s career and has a persistent effect on publication and grants. The relationship is not simple and automatic — mentees do not automatically inherit their mentors success in publication and grant funding. Instead the mentoring relationship is mediated by transfer of knowledge, norms, advice, and connections. And gender disparities are persistent and visible.

This talk resonated with a number of areas in which the Program and Library engage:

First, diversity is a core library value, and this research suggests ways in which the libraries can support a more diverse academic community.  The success of early career scholars depends in part on developing a substantial number of specialized career skills that are not part of a specific scientific discipline — including, among many other things (see for example, these slides on reputation and communication), navigating the scholarly publishing process, writing grant proposals, managing bibliographies, and curating data. Much of this knowledge is tacit — it is not explicitly taught but instead transferred through personal mentoring.  Libraries are one of the rare parts of the university that are able to successfully capture this tacit knowledge and make it more widely available across the community. The libraries IAP courses  are an excellent example of this.

Second, most of the data used for this research is based on Library-mediated collections — citations drawn from journal collections and metadata from dissertation collections. Further, as there is increasing pressure on universities for quantitative evaluation, and increasing desire to actively catalyze collaboration and productivity, there is an increasing need for rich access to Library collections as data, for guidance on tools and approaches (see, for  an overview our class on citation analysis), and for expert assistance.  Since few researchers have methodological or domain expertise related to bibliometric and scientometric data, this presents an unusual opportunity for libraries to be entrepreneurial in collaborating on new research.

Third, during this talk, Chaoqun noted that that the most laborious and time-consuming phase of the research was the data cleaning and linking phase — particularly dealing with name disambiguation. ORCID, in which the library serves a leadership role (and which MIT has adopted), aims to eliminate this problem. ORCID has spread widely — and just within this month over a dozen major publishers announced their intent to require ORCID’s for journal submissions.


Jul 06, 9:48am

Kim Dulin,  who is director of the Harvard Library Innovation Lab and Associate Director for Collection Development and Digitization for the Harvard Law School Library, and former co-director of the Harvard Library Innovation Lab, presented a talk  on Taking on Link Rot: Harvard Innovation Lab’s Perma.CC  as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, Kim  discusses how libraries can mitigate link rot in legal scholarship by coordinating on digital preservation.

In her abstract, Kim describes her talk as follows: ( is a web archiving platform and service developed by the Harvard Library Innovation Lab (LIL) to help combat link rot. Link rot occurs when links to websites point to web pages whose content has changed or disappeared. allows authors and editors to create permanent links for citations to web sources that will not rot. Upon direction from an author, will retrieve and save the contents of a cited web page and assign it a permanent link. The link is then included in the author’s references. When users later follow those references, they will have the option of proceeding to the website as it currently exists or viewing the cached version of the website as the creator of the link saw it. Regardless of what happens to the website in the future, the content will forever be accessible for scholarly and educational purposes via

According to the talk, link rot in law publications is very high — approximately fifty percent of links in Supreme Court of the US opinions are rotten, and the situation is worse in law journals. has been successful in part because durability is a very important selling point for attorneysA signal of this is that the latest edition of the official editorial manual for law publications (the “blue book“), now recommends that links included in legal publication be archived.

Perma provides a workable solution to a problem of concern, centered on libraries. In her talk Kim focuses on the diverse role that libraries play. Libraries act as gatekeepers for the content to be preserved; as long-term custodians of the content (and technically as mirrors); and as direct access points.

(I will also note that libraries are critical in conducting research and develop standards in this area.  The MIT Library is engaged in developing practices for collaborative stewardship as a member of the National Digital Stewardship Alliance, and the  Program is engaged in research on data management and stewardship.)

The talk discusses a number of new directions for perma, including perma-link plugins for Word and wordpress; an API to the service; creating a private LOCKSS network to replicate archival content;  and establishing a formal structure of governance, archival policies, and sustainability (funding and resources),

These directions resonates with me. is currently a  project that has been very successful  at approaching the very general problem of link rot within a specific community of practice.  The success of the project in part has to do with the knowledge of, connections with, and adaptation to a specific community. It will be interesting to see how governance and sustainability evolves to enable the transition from a project to community-supported infrastructure . 

Jun 18, 9:04am
It is an old saw that science is founded reproducibility… However, the truth is that, reproducibility has always been more difficult than generally assumed — even where the underlying phenomenon and are robust. Since Ioannidis’s PLOS article  in 2005, there has been increasing attention in the medical research to the issue of reproducibility; and attention has been unprecedented in the last two years, with even the New York Times commenting on  “jarring” instances of irreproducible, unreliable, or fraudulent research results.
Scientific reproducibility is most viewed through a methodological or statistical lens, and increasingly, through a computational lens. (See for example, our book on reproducible statistical computation.)  Over the last several years, I’ve taken part in collaborations to that approach reproducibility from the perspective of informatics: as a flow of information across a lifecycle that spans collection, analysis, publication, and reuse.
I had the opportunity to present a sketch of this approach at a recent workshop on reproducibility at the National Academy of Sciences, and at one our Program on Information Science brown bag talks.
The slides from the brown bag talk discuss some definitions of reproducibility, and outline a model for understanding reproducibility as an an information flow:
(Also see these videos from workshop on informatics approaches, and other definitions of reproducibility)
In the talk, the talk shows how  reproducibility claims as generally discussed in science, are not crisply defined, and the same reproducibility terminology is used to refer to very different sorts of assertions about the world, experiments, and systems. I outline an approach which takes each type of reproducibility claim and assesses: What are the use cases involving this claim? What does each type of reproducibility claim imply for  information properties, flow and systems? What are proposed or potential interventions in information systems that would strengthen the claims?
For example, a set of reproducibility issues is associated with validation of results. There are several distinct use cases and claims embedded in this — one of which I label as “fact-checking” because of its similarities to the eponymous journalistic use case:
  • Use Case: Post-publication reviewer wants to establish that published claims correspond to analysis method performed.
  • Reproducibility claim: Given public data identifier & analysis algorithm, an independent application of the algorithm yields a new estimate that is within the originally reported uncertainty.
  • Some potential supporting informatics claims:
    1. Instance of data retrieved via identifier is semantically equivalent to instance of data used to support published claim
    2. analysis algorithm is robust to choice of reasonable alternative implementation
    3. implementation of algorithm is robust to reasonable choice of execution details and context
    4. published direct claims about data are semantically equivalent to subset of claims produced by authors previous application of analysis
  • Some potential informatic interventions:
    • In support of claim 1:
      • Detailed provenance history for data from collection through analysis and deposition.
      • Automatic replication of direct data claims from deposited source
      • Cryptographic evidence
        (e.g. cryptographic signed {analysis output including, cryptographic hash of data} & {cryptographic hash of data retrieved via identifier})
    • In support of claim 2:
      • Standard implementation, subject to community review
      • Report of results of application of implementation on standard testbed
      • Availability of implementation for inspection
Overall, my conjecture is that if we wish to support reproducibility  broadly in information systems there are a number of properties/design principles for of information systems that will enhance reproducibility. Within information systems I conjecture that we should designing to maintain properties of: transparency, auditability, provenance, fixity, identification, durability, integrity, repeatability, non-repudiation, and self-documentation. When designing the policies, incentives, and human interactions with these systems we should consider: barriers to entry, ease of use, support for intellectual communities of practice, personalization, credit and attribution, security, performance, sustainability,cost, and trust engineering.

Apr 29, 12:55pm

David Weinberger,  who is a Shorenstein Fellow at Harvard University, and former co-director of the Harvard Library Innovation Lab, presented a talk  on Libraries as Platforms: Enabling Libraries to Become Community Centers of Meaning part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, David  discusses how libraries can increase their relevance in a networked world by creating information platforms that enable communities to locate, create, and discuss contextually relevant connections among information resources.

In his abstract, David describes his talk as follows:

Libraries are in a unique position to reflect a community back to itself, enabling us to see what matters, and to use that information so that the community learns from itself. This is one of the primary use cases for developing and widely deploying library platforms. But becoming a community center of meaning can easily turn into creating an echo chamber. The key is developing interoperable systems that let communities learn from one another. We’ll look at one proposal for a relatively straightforward way of doing so that’s so dumb that it just might work.

David describes libraries as a “black hole on the Net” — the knowledge and culture that only libraries have entrusted with is generally not available on the web. He claims that the core institutional advantage of libraries is not only access, but an understanding of what matters to specific communities paired with incentives that are fully aligned with those communities.

His talk that … Meaning comprises a set of connections that are important to a community. Libraries have always been aligned with user communities and helped them discover and make sense of meaningful information. And changes in internet and communication technology create an opportunity for libraries to help communities create and reflect back community meaning.

The talk suggested that Libraries can move toward this by creating API’s that enable open access to their open content, and metadata (broadly defined) related both to content and to the local use of that content; and conjectured that linked data approaches are necessary for integrating platforms and metadata at scale.

David discussed StackLife as an example. StackLife  uses circulation metadata to provides a private  a shareable, public (aggregated) normalized measure of physical book usage in several libraries. It enables , and is shareable — allowing for comparisons across libraries.

David not discuss privacy  in detail but noted it as a key issue —  to build successful platform for creating meaning, libraries will need to rethink their approach to of patron privacy: Rather than discarding information on patron behaviors and use of services, we need to collect it, and use it in service of the community. In this vision, libraries will provide the infrastructure — and new tools using this infrastructure will be written by people outside of libraries (as well as within).

I will note that the Program is engaged in research toward creating a modern approach to privacy concepts and controls. I will also note that to maintain a platform will require digital sustainability   and organizational sustainability. Realizing the former will require designing systems with a view towards supporting long term access. Realizing the latter will require identifying stakeholders that have mutually reinforcing incentives to create digital stuff, use digital stuff created by others, and maintain platforms for such stuff. (Typically, in sciences, such stakeholders are clustered around sets of domain problems…)

A recurring theme of David’s talk was that “libraries won’t invent their own future.”: Libraries can now see and participate in the cultural appropriation by their communities of the work entrusted to libraries. And open platforms will enable the world to integrate library knowledge into sites, tools, and services that libraries on their own might not have envisioned or have had the resources to develop.

This resonates with me, and I will add that any successful platform will almost certainly require using tools and infrastructure neither built by nor for the libraries. It will also require us to collaborate with organizations far beyond our boundaries.