Alex Chassanoff is a CLIR/DLF Postdoctoral Fellow in the Program working to identify, understand and describe baseline characteristics about software creation, use, and reuse in research libraries/archives, grounded in cases found across MIT.
Below are a (growing) compendium of resources related to software curation for collecting institutions.
What’s missing? Email me here!
I. Collecting/Acquiring/Appraising Software
Data Management, Planning & Policies
Cornell's Guide to Writing "Readme" Style Metadata: Templates/best practice/guidance for creating "readme" files to accompany data sets/software.
Data Management Planning Tool (2011-present): An online application that helps researchers create data management plans.
Depsy (2015-present): Depsy helps users investigate impact metrics for scientific software, tracking research software packages hosted on CRAN (software repo for R programming language) or PyPI (software repo for Python-language software).
GNU Ethical Repository Criteria: Criteria for "hosting parts of the GNU operating system"; can also be used to evaluate other repositories hosting free source code (and optionally executable programs too)
1st IEEE Workshop on Future of Research Curation and Research Reproducibility (2016): Summarizes workshop discussions and recommendations related to curation of research data, software, and related artifacts.
IFLA Key Issues for E-Resources Collection Development: A Guide for Libraries (2012): Overview for libraries that addresses some key issues in collecting “e-resources.”
Springer Nature Research Data Policies (2016): FAQ by researchers about data policies, data repositories, and sharing data.
Guidelines & Tools
Guidelines for Transparency and Openness Promotion in Journal Policies: "Established by the Open Science Framework The TOP Guidelines provide a template to enhance transparency in the science that journals publish. With minor adaptation of the text, funders can adopt these guidelines for research that they fund."
How to Appraise and Select Research Data for Curation (2010): Discussion of appraisal concepts; geared towards research data but provides insight into practices for appraising software.
Media Stability Ratings (2018): Assigns a "media stability rating" to different media formats, in attempt to mitigate loss.
Stewardship of E-Manuscripts (2009): Compilation of tools that can be used in acquisition & stewarding of born-digital materials.
Timbus Debian Software Extractor (2015): Tool to extract metadata for debian software packages, developed as part of the Timbus Context Project.
II. Describing Data/Software/Environments
Descriptive Standards & Definitions
Asset Description Metadata Schema for Software: A metadata schema and vocabulary to describe software making it possible to more easily explore, find, and link software on the Web.
DataCite (2016-present): A metadata schema for the publication and citation of research data.
Data Documentation Initiative (2011-present): Standard to describe the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences.
DDI-RDF Discovery Vocabulary (2013): RDF vocabulary to support the discovery of micro-data sets (aka "raw data") and related metadata using RDF technologies.
Force 11 Software Citation Principles (2016): A consolidated set of citation principles that may encourage broad adoption of a consistent policy for software citation across disciplines and venues.
Software Ontology (2011): A resource for describing software tools, their types, tasks, versions, provenance and data associated.
Trove Software Map: Classifies software by the following 9 attributes: development status, environment, intended audience, name, natural language, operating system, programming language, and topic.
Software Search is Not a Science, Even Among Scientists (2016): Survey of how researchers search for software, including criteria they use to evaluate software results (e.g., how easy is the software to learn)
Examples of Cataloged Software/Data Sets/Repositories
re3data: Registry of research data repositories
Case Studies & Reports
A Case Study in Preserving a High Energy Physics Application with Parrot (2015): Describes the development of Parrot, an application dependency capture program for complex environments.
Exploring Curation-Ready Software (2017): Report 1 by the Curation-Readiness Working Group at the Software Preservation Network.
Heritage.exe (2016): Cross-comparison case study of software preservation strategies at three US institutions.
Improving Curation-Readiness (2017): Report 2 by the Curation-Readiness Working Group at the Software Preservation Network.
Preserving and Emulating Digital Art Objects (2015): Reports on the results of an NEH-funded research project "to create contemporary emulation environments for artworks selected from the archive, to classify works according to type and document research discoveries regarding the preservation effort."
Preserving Virtual Worlds I, II (2007-2010; 2011-2013): The Preserving Virtual Worlds projects I and II explore methods for preserving digital games and interactive fiction.
Preserving.Exe: Toward a National Strategy for Software Preservation (2013): A report from the National Digital Information Infrastructure and Preservation Program of the Library of Congress, focused on identifying valuable and at-risk software.
SPN Metadata Survey (2017): Survey results on how institutions with digital preservation programs are using metadata to aid in preserving software.
The Digital Curation Sustainability Model(DCSM) (2015): JISC-funded project to highlight the key concepts, relationships and decision points for planning how to sustain digital assets into the future.
National Software Reference Library (NSRL): The NSRL is designed to collect software from various sources and incorporate file profiles computed from this software into a Reference Data Set (RDS) of information.
PERSIST (2012-present): UNESCO hosted initiative to "ensure long-term access to the World’s Digital Heritage by facilitating development of effective policies, sustainable technical approaches, and best preservation practices."
Software Preservation Network (SPN) (2013-present): Community of practitioners and researchers, working to address the problems of how to preserve software.
Software Heritage Network (2016-present): "The goal of the SHN is to collect all publicly available software in source code form, replicate it massively to ensure its preservation, and make it available to everyone who needs it."
Tools, Applications, Best Practices & Standards
Library of Congress Recommended Format Statement for Software: "Identifies hierarchies of the physical and technical characteristics of software which will best meet the needs of all concerned, maximizing the chances for survival and continued accessibility of creative content well into the future."
National Archives' Strategy for Preserving Digital Archival Materials (2017): Overview of strategies used by NARA to preserve digital materials.
Obsolescence Ratings (2018): "This list categorizes the ease with which a range of formats that have been, or are, in common use in their fields can be read, in terms of the equipment available to do so."
Pericles Extraction Tool (2015-present): Extraction of significant environment information from live environments, to better support object use and reuse, in the scope of long term preservation of data.
Preservation Quality Tool (2016-present): "This tool will provide for reuse of preserved software applications, improve technical infrastructure, and build on existing data preservation services."
Software Independent Archival of Relational Databases (SIARD) (2007): An open file format developed by the Swiss Federal Archives for the long-term archiving of relational databases; data can be stored long-term independently of the original software.