Guest Post: Curation as Context: Software in the Stacks

Alex Chassanoff is a CLIR/DLF Postdoctoral Fellow in the Program on Information Science and continues a series of posts on software curation.

As scholarly landscapes shift, differing definitions for similar activities may emerge from different communities of practice.   As I mentioned in my previous blog post, there are many distinct terms for (and perspectives on) curating digital content depending on the setting and whom you ask [1].  Documenting and discussing these semantic differences can play an important role in crystallizing shared, meaningful understandings.  

In the academic research library world,  the so-called data deluge has presented library and information professionals with an opportunity to assist scholars in the active management of their digital content [2].  Curating research output as institutional content is a relatively young, though growing phenomenon.  Research data management (RDM) groups and services are increasingly common in research libraries, partially fueled by changes in federal funding grant application requirements to encourage data management planning.  In fact, according to a recent content analysis of academic library websites, 185 libraries are now offering RDM services [3].  The charge for RDM groups can vary widely; tasks can range from advising faculty on issues related to privacy and confidentiality, to instructing students on potential avenues for publishing open-access research data.

As these types of services increase, many research libraries are looking to life cycle models as foundations for crafting curation strategies for digital content [4].  On the one hand, life cycle models recognize the importance of continuous care and necessary interventions that managing such content requires.  Life cycle models also provide a simplified view of essential stages and practices, focusing attention on how data flows through a continuum.  At the same time, the data flow perspective can obscure both the messiness of the research process and the complexities of managing dynamic digital content [5,6].  What strategies for curation can best address scenarios where digital content is touched at multiple times by multiple entities for multiple purposes?  

Christine Borgman notes the multifaceted role that data can play in the digital scholarship ecosystem, serving a variety of functions and purposes for different audiences.  Describing the most salient characteristics of that data may or may not serve the needs of future use and/or reuse. She writes:

These technical descriptions of “data” obscure the social context in which data exist, however. Observations that are research  findings  for  one  scientist  may  be background context to another. Data that are adequate evidence for one purpose (e.g., determining whether water quality is safe for surfing) are inadequate for others (e.g., government standards for testing drinking water). Similarly, data that are synthesized for one purpose may be “raw” for another. [7]

Particular data sets may be used and then reused for entirely different intentions.  In fact, enabling reuse is a hallmark objective for many current initiatives in libraries/archives.  While forecasting future use is beyond our scope, understanding more about how digital content is created and used in the wider scholarly ecosystem can prove useful for anticipating future needs.  As Henry Lowood argues, “How researchers will actually put their hands and eyes on historical software and data collections generally has been bracketed out of data curation models focused on preservation”[8].  

As an example, consider the research practices and output of faculty member Alice, who produces research tools and methodologies for data analysis. If we were to document the components used and/or created by Alice for this particular research project, it might include the following:

  • Software program(s) for computing published results
  • Dependencies for software program(s) for replicating published results
  • Primary data collected and used in analysis
  • Secondary data collected and used in analysis
  • Data result(s) produced by analysis
  • Published journal article

We can envision at least two uses of this particular instantiation of scholarly output. First, the statistical results of the data can be verified by replicating the conditions of the analysis.   Second, the statistical approach executed by the software program can be executed on a new inputted data set.  In this way, software can simultaneously serve as both an outcome to be preserved and as a methodological means to an (new) end.  

There are certain affordances in thinking about strategies for curation-as-context, outside the life cycle perspective.  Rather than emphasizing content as an outcome to be made accessible and preserved through a particular workflow, curation could instead aim to encompass the characterization of well-formed research objects, with an emphasis on understanding the conditions of their creation, production, use, and reuse.   Recalling our description of Alice above, we can see how each component of the process can be brought together to represent an instantiation of a contextually-rich research object.

Curation-as-context approaches can help us map the always-already in flux terrain of dynamic digital content.  In thinking about curating software as a complex object for access, use, and future use, we can imagine how mapping the existing functions, purposes, relationships, and content flows of software within the larger digital scholarship ecosystem may help us anticipate future use, while documenting contemporary use.  As Cal Lee writes:

Relationships to other digital objects can dramatically affect the ways in which digital objects have been perceived and experienced. In order for a future user to make sense of a digital object, it could be useful for that user to know precisely what set of surrogate representations – e.g. titles, tags, captions, annotations, image thumbnails, video keyframes – were associated with a digital object at a given point in time. It can also be important for a future user to know the constraints and requirements for creation of such surrogates within a given system (e.g. whether tagging was required, allowed, or unsupported; how thumbnails and keyframes were generated), in order to understand the expression, use and perception of an object at a given point in time [9].

Going back to our previous blog post, we can see how questions like “How are researchers creating and managing their digital content” are essential counterparts to questions like “What do individuals served by the MIT Libraries need to able to reuse software?” Our project aims to produce software curation strategies at MIT Libraries that embrace Reagan Moore’s theoretical view of digital preservation, whereby “information generated in the past is sent into the future” [10].  In other words, what can we learn about software today that makes an essential contribution to meaningful access and use tomorrow?  

Works Cited

[1] Palmer, C., Weber, N., Muñoz, T, and Renar, A. (2013), “Foundations of data curation: The pedagogy and practice of ‘purposeful work’ with research data”, Archives Journal, Vol 3.

[2] Hey, T.  and Trefethen, A. (2008), “E-science, cyberinfrastructure, and scholarly communication”, in Olson, G.M. Zimmerman, A., and Bos, N. (Eds), Scientific Collaboration on the Internet, MIT Press, Cambridge, MA.

[3] Yoon, A. and Schultz, T. (2017), “Research data management services in academic libraries in the US: A content analysis of libraries’ websites” (in press). College and Research Libraries.

[4] Ray, J. (2014), Research Data Management: Practical Strategies for Information Professionals, Purdue University Press, West Lafayette, IN.

[5] Carlson, J. (2014), “The use of lifecycle models in developing and supporting data services”, in Ray, J. (Ed),  Research Data Management: Practical Strategies for Information Professionals, Purdue University Press, West Lafayette, IN.

[6] Ball, A. (2010), “Review of the state of the art of the digital curation of research data”, University of Bath.

[7] Borgman, C., Wallis, J. and Enyedy, N. (2006), “Little science confronts the data deluge: Habitat ecology, embedded sensor networks, and digital libraries”, Center for Embedded Network Sensing, 7(1–2), 17 – 30. doi: 10.1007/s00799-007-0022-9. UCLA: Center for Embedded Network Sensing.  

[8] Lowood, H. (2013), “The lures of software preservation”, Preserving.exe: Towards a national strategy for software preservation, National Digital Information Infrastructure and Preservation Program of the Library of Congress.

[9] Lee, C. (2011), “A framework for contextual information in digital collections”, Journal of Documentation 67(1).

[10] Moore, R. (2008), “Towards a theory of digital preservation”, International Journal of Digital Curation 3(1).


See also: Alexandra