Issues in Curating the Open Web at Scale: Comments on Gary Price’s Talk

Gary Price, who is chief editor of InfoDocket, contributing editor of Search Engine Land, co-founder of Full Text Reports and who has worked with internet search firms and library systems developers alike, gave this talk on Issues in Curating the Open Web at Scale as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, Price argued that the libraries should be more aggressively engaging with content on the open web (i.e. stuff you find through Google). He further argued that the traditional methods and knowledge used by librarians to curate traditional print collections may be usefully applied to open web content.  

Price has been a leader in developing and observing web discovery — as a director of ask.com, and as author of the first book to cogently summarize the limitations of web search: The Invisible Web: Uncovering Information Sources Search Engines Can’t See.  The talk gave a whirlwind tour of the history of curation of the open web, and noted the many early efforts aimed at curating resource directories that withered away with the ascent of Google.

In his abstract, Price summarizes as follows:

Much of the web remains invisible: resources are undescribed, unindexed or simply buried —  as many people rarely look past the first page of Google searches or are unavailable from traditional library resources. At the same time many traditional library databases pay little attention to quality content from credible sources accessible on the open web.

How do we build collections of quality open-web resources (i.e. documents, specialty databases, and multimedia) and make them accessible to individuals and user groups when and where they need it?

This talk reflects on the emerging tools for systematic programmatic curation; the legal challenges to open-web curation; long term access issues, and the historical challenges to building sustainable communities of curation.

Across his talk, Price stressed three arguments.

First, that much of the web remains invisible: Many databases and structure information sources are not indexed by Google. And although increasing amounts of structured information is indexed — most is behaviorally invisible — since the vast majority of people do not look beyond the first page of Google results.  Further, behavioral invisibility of information is exacerbated by the decreasing support for complex search operators in open web search engines.

Second, Price argued that library curation of the open web would add value: Curation would  make the invisible web visible; counteract gaming of results; and identify credible sources.  

Third, Price argued that a machine-assisted approach can be an effective strategy. He described how tools such as website watcher, archiveit, RSS aggregators, social media monitoring services, and content alerting services can be brought together by a trained curator, to develop continually updated collections of content that are of interest to targeted communities of practice. He argued that familiarity with these tools and approaches should be part of the Librarian’s toolkit – especially for those in liaison roles.

Similar tools are discussed in the courses we teach on professional reputation management — and I’ve found a number (particularly the latter three) useful as an individual professional.  More generally, I speculate that curation of the open web will be a larger part of the library mission — as we have argued in the 2015 National Agenda for Digital Stewardship, organizations rely on more information than they can directly steward.  The central problem is coordinating stakeholders around stewarding collection from which they derive common value.  This remains a deep, and unsolved problem, however, efforts such as The Keeper’s Registry and collaborations such as the International Internet Preservation Society (IIPC) and the  National Digital Stewardship Alliance (NDSA)  are making progress in this area.


See also: drmaltman