Guest Post: DataRescue-Boston@MIT Wrap up

Alex Chassanoff  who is a Postdoctoral Fellow in the Program for Information Science, contributes to this detailed wrapup of the recent Data Rescue Boston event that she helped organize.


Data Rescue Boston@MIT Wrap up

Written by event organizers:

Alexandra Chassanoff

Jeffrey Liu

Helen Bailey

Renee Ball

Chi Feng


On Saturday, February 18th, the MIT Libraries and the Association of Computational Science and Engineering co-hosted a day long Data Rescue Boston hackathon at Morss Hall in the Walker Memorial Building.  Jeffrey Liu, a Civil and Environmental Engineering graduate student at MIT, organized the event as part of an emerging North American movement to engage communities locally in the safeguarding of potentially vulnerable federal research information.  Since January, Data Rescue events have been springing up at libraries across the country, largely through the combined organizing efforts of Data Refuge and Environmental Data and Governance Initiative.


The event was sponsored by MIT Center for Computational Engineering, MIT Department of Civil and Environmental Engineering, MIT Environmental Solutions Initiative, MIT Libraries, MIT Graduate Student Council Initiatives Fund, and the Environmental Data and Governance Initiative.

Here are some snapshot metrics from our event:

# of Organizers: 8
# of Volunteers: ~15
# of Guides: 9
# of Participants: ~130
# URLs researched: 200
# URLs harvested: 53
# GiB harvested: 35
# URLs seeded: 3300 at event (~76000 from attendees finishing after event)
# Agency Primers started: 19
# Cups of Coffee: 300
# Burritos: 120
# Bagels: 450
# Pizzas: 105

Goal 1. Process data

MIT’s data rescuers managed to process a similar amount of data through the seeding and harvesting phases of data rescue as compared to other similarly-sized events.  For reference, Data Rescue San Francisco researched 101 URLs and harvested 25 GB of data at their event.  Data Rescue DC, a two-day event which also included a bagging/describing track which we did not have, harvested 20GB of data, seeded 4776 URLs, bagged 15 datasets and described 40 data sets.   

Goal 2. Expand scope

Another goal of our event was to explore creating new workflows for expanding efforts beyond an existing focus on federal agency environmental and climate data.  Toward that end, we decided to pilot a new track called Surveying which we used to identify and describe programs, datasets and documents at federal agencies still in need of agency primers.  We were lucky enough to have particular domain experts on hand who assisted us with our efforts.  In total, we were able to begin expansion efforts for agencies and departments at the Department of Justice, Department of Labor, Health and Human Services, and the Federal Communications Commission.

Goal 3: Engage and build community

Attendees at our event spanned age groups, occupations, and technical abilities.  Participants included research librarians, concerned scientists, and expert undergraduate hackers; according to national developers for the Data Rescue archiving application, MIT had the largest number of “tech-tel” than any other event thus far.   As part of the Storytelling aspect of Data Rescue events, we captured profiles for twenty-seven of our attendees.  Additionally, we created Data Use Stories that describe how some researchers use specific data sets from the National Water Information System (USGS), the Alternative Fuels Data Center (DOE),  and the Global Historical Climate Network (NOAA).  These stories let us communicate how these data sets are used to better understand our world, as well as make decisions that impact our everyday lives.

The hackathon at MIT was the second event hosted by Data Rescue Boston, which has begun hosting weekly working groups every Thursday at MIT  for continuing engagement on compiling tools and documentation to improve workflow, identify vulnerable data sets, and create resources to help further efforts.   

Future Work

Data rescue events continue to gather steam, with eight major national events planned over the next month.  The next DataRescue Boston event will be held at Northeastern on March 24th. A dozen volunteers and attendees from the MIT event have already signed up to help organize workshops and efforts at the Northeastern event.

Press Coverage of our Event:

See also: drmaltman