Blog

May 01, 11:09am

Professor Laura Hosman, who is Assistant Professor at Arizona State University (with a joint appointment in the School for the Future of Innovation in Society and in The Polytechnic School) gave this talk Becoming a Practitioner Scholar in Technology for Development as part of the Program on Information Science Brown Bag Series.

In her talk, illustrated by the slides below, Hosman argues that, for a large part of the world “the library of the future” will be based on cellphones, intranets, and digital-but-offline content.

 

 

Hosman abstracted her talk as follows:

Access to high-quality, relevant information is absolutely foundational for a quality education. Yet, so many schools across the developing world lack fundamental resources, like textbooks, libraries, electricity, and Internet connectivity. The SolarSPELL (Solar Powered Educational Learning Library) is designed specifically to address these infrastructural challenges, by bringing relevant, digital educational content to offline, off-grid locations. SolarSPELL is a portable, ruggedized, solar-powered digital library that broadcasts a webpage with open-access educational content over an offline WiFi hotspot, content that is curated for a particular audience in a specified locality—in this case, for schoolchildren and teachers in remote locations. It is a hands-on, iteratively developed project that has involved undergraduate students in all facets and at every stage of development. This talk will examine the design, development, and deployment of a for-the-field technology that looks simple but has a quite complex background.

In her talk, Hosman describes how the inspiration for her current line of research and practice started when she received a request to aid deployment of the One Laptop Per Child project in Haiti. The original project had allocated twenty-five million dollars to laptop purchasing, but failed to note that electric power was not available in many of the areas they needed to reach — so they asked for Professor Hosman’s help in finding an alternative power source. Over the course of her work, the focus of her interventions has shifted from solar power systems, to portable computer labs, to portable libraries — and she noted that every successful approach involved evolution and iteration.

Hosman observes that for much of the world’s populations electricity is a missing prerequisite to computing and to connectivity. She also notes that access to computing for most of the world comes through cell phones, not laptops. (And she recalls even finding that the inhabitants of remote islands occasionally had better cellphones than she carried.) Her talk notes that there are over seven billion cell phones in the world — which is over three times the number of computers worldwide, and many thousands of times the number of libraries.

Hosman originally titled her talk The Solar Powered Educational Learning Library – Experiential Learning And Iterative Development. The talk’s new title reflects one of three core themes that ran through the talk — the importance of people. Hosman argues that technology is never by itself sufficient (there is no “magic bullet”) — to improve people’s lives, we need to understand and engineer for people’s engagement with technology.

The SolarSPELL project has engaged with people in surprising ways. Not only is it designed around the needs of the target clients, but it has continuously involved Laura’s engineering students in its design and improvement; and has further involved high-school students in construction. Under Hosman’s direction, university and high school students worked together to construct a hundred SolarSPELL’s using mainly parts ordered from amazon. Moreover, Peace Corps volunteers are a critical part of the project. The people in the Corps provide the grass-roots connections that spark people to initially try the SolarSPELL, and provide a persistent human connection that supports continuing engagement.

A second theme of the talk is the importance of open and curated content. Simply making a collection freely available on-line is not enough, when we want most people in the world to be able to access it. For collections to be meaningfully accessible they need to available for bulk download; they need to be usable under an open license; they need to be selected for a community of use that does not have the option of seeking more content online; and they need to contain all of the context needed for that community to understand them.

A final theme that Hosman stresses is that any individual (scholar, practitioner, actor) will never have all the skills needed to address complex problems in the complex real world — solving real world problems requires a multidisciplinary approach. SolarSPELL demonstrates this through combining expertise in electrical engineering, content curation, libraries, software development, education, and in the sociology and politics of the region. Notably, the ASU libraries have been a valuable partner in the SolarSPELL project, and have even participated in fieldwork. Much more information about this work and its impact can be found in Hosman’s scholarly papers.

The MIT libraries have embraced a vision of serving a global community of scholars and learners. Hosman’s work demonstrates the existence of large communities of learners that would benefit from open educational and research materials — but whose technology needs are not met by most current information platforms (even open ones). Our aim is that future platforms not only enable research and educational content to reach such communities, but also that local communities worldwide can contribute their local knowledge, perspective, and commentary to the world’s library.

Surprisingly, the digital preservation research conducted at the libraries is of particular relevance to tackling these challenges. The goal of digital preservation can be thought of as communicating with the future — and in order to accomplish this, we need to be able to capture both content and context, steward it over time (managing provenance, versions, and authenticity), and prepare it to be accessed through communication systems and technologies that do not yet exist. A corollary is that properly curated content should be readily capable of being stored and delivered offline — which is currently a major challenge for access by the broader community.

Reflecting the themes of Hosman’s talk, the research we conduct here, in the Program on Information Science, is fundamentally interdisciplinary: For example our research in information privacy has involved librarians, computer scientists, statisticians, legal scholars, and many others. Our Program also aims to bridge research and practice, support translational and applied research, which often requires sustained engagement with grassroots stakeholders. For example, the success of the DIY redistricting (aka. “participative GIS”) efforts in which we’ve collaborate relied on sustained engagement with grassroots good-government organizations (such as Common Cause and League of Women Voters); students; and the media. For those interested in these and other projects, we have published reports and articles describing them.


Apr 21, 10:13am

Alex Chassanoff  who is a Postdoctoral Fellow in the program on information science continues a series of posts on software curation.

“Curation as Context:

Software in the Stacks”

As scholarly landscapes shift, differing definitions for similar activities may emerge from different communities of practice.   As I mentioned in my previous blog post, there are many distinct terms for (and perspectives on) curating digital content depending on the setting and whom you ask [1].  Documenting and discussing these semantic differences can play an important role in crystallizing shared, meaningful understandings.  

In the academic research library world,  the so-called data deluge has presented library and information professionals with an opportunity to assist scholars in the active management of their digital content [2].  Curating research output as institutional content is a relatively young, though growing phenomenon.  Research data management (RDM) groups and services are increasingly common in research libraries, partially fueled by changes in federal funding grant application requirements to encourage data management planning.  In fact, according to a recent content analysis of academic library websites, 185 libraries are now offering RDM services [3].  The charge for RDM groups can vary widely; tasks can range from advising faculty on issues related to privacy and confidentiality, to instructing students on potential avenues for publishing open-access research data.

As these types of services increase, many research libraries are looking to life cycle models as foundations for crafting curation strategies for digital content [4].  On the one hand, life cycle models recognize the importance of continuous care and necessary interventions that managing such content requires.  Life cycle models also provide a simplified view of essential stages and practices, focusing attention on how data flows through a continuum.  At the same time, the data flow perspective can obscure both the messiness of the research process and the complexities of managing dynamic digital content [5,6].  What strategies for curation can best address scenarios where digital content is touched at multiple times by multiple entities for multiple purposes?  

Christine Borgman notes the multifaceted role that data can play in the digital scholarship ecosystem, serving a variety of functions and purposes for different audiences.  Describing the most salient characteristics of that data may or may not serve the needs of future use and/or reuse. She writes:

These technical descriptions of “data” obscure the social context in which data exist, however. Observations that are research  findings  for  one  scientist  may  be background context to another. Data that are adequate evidence for one purpose (e.g., determining whether water quality is safe for surfing) are inadequate for others (e.g., government standards for testing drinking water). Similarly, data that are synthesized for one purpose may be “raw” for another. [7]

Particular data sets may be used and then reused for entirely different intentions.  In fact, enabling reuse is a hallmark objective for many current initiatives in libraries/archives.  While forecasting future use is beyond our scope, understanding more about how digital content is created and used in the wider scholarly ecosystem can prove useful for anticipating future needs.  As Henry Lowood argues, “How researchers will actually put their hands and eyes on historical software and data collections generally has been bracketed out of data curation models focused on preservation”[8].  

As an example, consider the research practices and output of faculty member Alice, who produces research tools and methodologies for data analysis. If we were to document the components used and/or created by Alice for this particular research project, it might include the following:

 

  • Software program(s) for computing published results
  • Dependencies for software program(s) for replicating published results
  • Primary data collected and used in analysis
  • Secondary data collected and used in analysis
  • Data result(s) produced by analysis
  • Published journal article

 

We can envision at least two uses of this particular instantiation of scholarly output. First, the statistical results of the data can be verified by replicating the conditions of the analysis.   Second, the statistical approach executed by the software program can be executed on a new inputted data set.  In this way, software can simultaneously serve as both an outcome to be preserved and as a methodological means to an (new) end.  

There are certain affordances in thinking about strategies for curation-as-context, outside the life cycle perspective.  Rather than emphasizing content as an outcome to be made accessible and preserved through a particular workflow, curation could instead aim to encompass the characterization of well-formed research objects, with an emphasis on understanding the conditions of their creation, production, use, and reuse.   Recalling our description of Alice above, we can see how each component of the process can be brought together to represent an instantiation of a contextually-rich research object.

Curation-as-context approaches can help us map the always-already in flux terrain of dynamic digital content.  In thinking about curating software as a complex object for access, use, and future use, we can imagine how mapping the existing functions, purposes, relationships, and content flows of software within the larger digital scholarship ecosystem may help us anticipate future use, while documenting contemporary use.  As Cal Lee writes:

Relationships to other digital objects can dramatically affect the ways in which digital objects have been perceived and experienced. In order for a future user to make sense of a digital object, it could be useful for that user to know precisely what set of surrogate representations – e.g. titles, tags, captions, annotations, image thumbnails, video keyframes – were associated with a digital object at a given point in time. It can also be important for a future user to know the constraints and requirements for creation of such surrogates within a given system (e.g. whether tagging was required, allowed, or unsupported; how thumbnails and keyframes were generated), in order to understand the expression, use and perception of an object at a given point in time [9].

Going back to our previous blog post, we can see how questions like “How are researchers creating and managing their digital content” are essential counterparts to questions like “What do individuals served by the MIT Libraries need to able to reuse software?” Our project aims to produce software curation strategies at MIT Libraries that embrace Reagan Moore’s theoretical view of digital preservation, whereby “information generated in the past is sent into the future” [10].  In other words, what can we learn about software today that makes an essential contribution to meaningful access and use tomorrow?  

Works Cited

[1] Palmer, C., Weber, N., Muñoz, T, and Renar, A. (2013), “Foundations of data curation: The pedagogy and practice of ‘purposeful work’ with research data”, Archives Journal, Vol 3.

[2] Hey, T.  and Trefethen, A. (2008), “E-science, cyberinfrastructure, and scholarly communication”, in Olson, G.M. Zimmerman, A., and Bos, N. (Eds), Scientific Collaboration on the Internet, MIT Press, Cambridge, MA.

[3] Yoon, A. and Schultz, T. (2017), “Research data management services in academic libraries in the US: A content analysis of libraries’ websites” (in press). College and Research Libraries.

[4] Ray, J. (2014), Research Data Management: Practical Strategies for Information Professionals, Purdue University Press, West Lafayette, IN.

[5] Carlson, J. (2014), “The use of lifecycle models in developing and supporting data services”, in Ray, J. (Ed),  Research Data Management: Practical Strategies for Information Professionals, Purdue University Press, West Lafayette, IN.

[6] Ball, A. (2010), “Review of the state of the art of the digital curation of research data”, University of Bath.

[7] Borgman, C., Wallis, J. and Enyedy, N. (2006), “Little science confronts the data deluge: Habitat ecology, embedded sensor networks, and digital libraries”, Center for Embedded Network Sensing, 7(1–2), 17 – 30. doi: 10.1007/s00799-007-0022-9. UCLA: Center for Embedded Network Sensing.  

[8] Lowood, H. (2013), “The lures of software preservation”, Preserving.exe: Towards a national strategy for software preservation, National Digital Information Infrastructure and Preservation Program of the Library of Congress.

[9] Lee, C. (2011), “A framework for contextual information in digital collections”, Journal of Documentation 67(1).

[10] Moore, R. (2008), “Towards a theory of digital preservation”, International Journal of Digital Curation 3(1).

 


Mar 02, 1:13am

Alex Chassanoff  who is a Postdoctoral Fellow in the Program for Information Science, contributes to this detailed wrapup of the recent Data Rescue Boston event that she helped organize.

 

Data Rescue Boston@MIT Wrap up

Written by event organizers:

Alexandra Chassanoff

Jeffrey Liu

Helen Bailey

Renee Ball

Chi Feng

 

On Saturday, February 18th, the MIT Libraries and the Association of Computational Science and Engineering co-hosted a day long Data Rescue Boston hackathon at Morss Hall in the Walker Memorial Building.  Jeffrey Liu, a Civil and Environmental Engineering graduate student at MIT, organized the event as part of an emerging North American movement to engage communities locally in the safeguarding of potentially vulnerable federal research information.  Since January, Data Rescue events have been springing up at libraries across the country, largely through the combined organizing efforts of Data Refuge and Environmental Data and Governance Initiative.

 

The event was sponsored by MIT Center for Computational Engineering, MIT Department of Civil and Environmental Engineering, MIT Environmental Solutions Initiative, MIT Libraries, MIT Graduate Student Council Initiatives Fund, and the Environmental Data and Governance Initiative.

Here are some snapshot metrics from our event:

# of Organizers: 8
# of Volunteers: ~15
# of Guides: 9
# of Participants: ~130
# URLs researched: 200
# URLs harvested: 53
# GiB harvested: 35
# URLs seeded: 3300 at event (~76000 from attendees finishing after event)
# Agency Primers started: 19
# Cups of Coffee: 300
# Burritos: 120
# Bagels: 450
# Pizzas: 105

Goal 1. Process data

MIT’s data rescuers managed to process a similar amount of data through the seeding and harvesting phases of data rescue as compared to other similarly-sized events.  For reference, Data Rescue San Francisco researched 101 URLs and harvested 25 GB of data at their event.  Data Rescue DC, a two-day event which also included a bagging/describing track which we did not have, harvested 20GB of data, seeded 4776 URLs, bagged 15 datasets and described 40 data sets.   

Goal 2. Expand scope

Another goal of our event was to explore creating new workflows for expanding efforts beyond an existing focus on federal agency environmental and climate data.  Toward that end, we decided to pilot a new track called Surveying which we used to identify and describe programs, datasets and documents at federal agencies still in need of agency primers.  We were lucky enough to have particular domain experts on hand who assisted us with our efforts.  In total, we were able to begin expansion efforts for agencies and departments at the Department of Justice, Department of Labor, Health and Human Services, and the Federal Communications Commission.

Goal 3: Engage and build community

Attendees at our event spanned age groups, occupations, and technical abilities.  Participants included research librarians, concerned scientists, and expert undergraduate hackers; according to national developers for the Data Rescue archiving application, MIT had the largest number of “tech-tel” than any other event thus far.   As part of the Storytelling aspect of Data Rescue events, we captured profiles for twenty-seven of our attendees.  Additionally, we created Data Use Stories that describe how some researchers use specific data sets from the National Water Information System (USGS), the Alternative Fuels Data Center (DOE),  and the Global Historical Climate Network (NOAA).  These stories let us communicate how these data sets are used to better understand our world, as well as make decisions that impact our everyday lives.

The hackathon at MIT was the second event hosted by Data Rescue Boston, which has begun hosting weekly working groups every Thursday at MIT  for continuing engagement on compiling tools and documentation to improve workflow, identify vulnerable data sets, and create resources to help further efforts.   

Future Work

Data rescue events continue to gather steam, with eight major national events planned over the next month.  The next DataRescue Boston event will be held at Northeastern on March 24th. A dozen volunteers and attendees from the MIT event have already signed up to help organize workshops and efforts at the Northeastern event.

Press Coverage of our Event:

http://gizmodo.com/rescuing-government-data-from-trump-has-become-a-nation-1792582499

https://thetech.com/2017/02/22/datarescue-students-collaborate-vital

https://medium.com/binj-reports/saving-science-one-dataset-at-a-time-389c7014199c#.lgrlkca9f



Jan 21, 9:42am

Alex Chassanoff  who is a Postdoctoral Fellow in the program on information science introduces a series of posts on software curation.


Building A Model for Software Curation:

An Introductory Post

 

In October 2016, I began working at the MIT Libraries as a CLIR/DLF Postdoctoral Fellow in Software Curation. CLIR began offering postdoctoral fellowships in data curation in 2012; however, myself and three others were part of the first cohort conducting research in the area of Software Curation.  At our fellowship seminar and training this summer,the four of us joked about not having any idea what we would be doing (and Google wasn’t much help). Indeed, despite years of involvement in digital curation, I was unsure of what it might mean to curate software. As has been well-documented in the library/archival science community, curation of data means many different things tomany different people.  Add in the term “software” and you increase the complexities.

At MIT Libraries, I was given the good fortune of working with two distinguished and esteemed experts in library research: Nancy McGovern, the Director of the Digital Preservation Program and Micah Altman, the Director of Research.   This blog post describes the first phase of our work together in defining a research agenda for software curation as an institutional asset.

Defining Scope

As we began to suss out possible research objectives and assorted activities, we found ourselves circling back to four central questions – which themselves split into associated sub-questions.

  • What is software? What is the purpose and function of software? What does it mean to curate software? How do these practices differ from preservation?
  • When do we curate software? Is it at the time of creation? Or when it becomes acquired by an institution?
  • Why do institutions and researchers curate software?
  • Who is institutionally responsible for curating software and for whom are we curating software?

Developing Focus and Purpose

We also began to outline the types of exploratory research questions we might ask depending on the specific purpose and entities we were creating a model for (see Table 1 below). Of course, these are only some of the entities that we could focus on; we could also broaden our scope to include research questions of interest to software publishers, software journals, or funders interested in software curation.

 

Entity Purpose: Libraries/Archives Purpose: MIT Specific
Research library What does a library need to safeguard + preserve software as an asset? How are other institutions handling this? How are funding agencies considering research on software curation? What are the MIT libraries’ existing and future needs related to software curation?
Software creator What are the best practices software creators should adopt when creating software? How are software creators depositing their software and how are journals recommending they do this? What are the individual needs and existing practices of software creators served by the MIT Libraries?
Software user What are the different kinds of reasons why people may use software? What are the conditions for use? What are the specific curation practices we should implement to make software usable for this community? What do individuals served by the MIT Libraries need to be able to reuse software?

Table 1: Potential purpose(s) of research by entity

Importantly, we wanted to adopt an agile research approach that considered software as an artifact, rather than (simply) as an outcome to be preserved and made accessible.  Curation in this sense might seek to answer ontological questions about software as an entity with significant characteristics at different levels of representation.   Certainly, digital object management approaches that emphasize documentation of significant properties or characteristics are long-standing in the literature.  At the same time, we wanted our approach to address essential curatorial activities (what Abby Smith termed “interventions”) that help ensure digital files remain accessible and usable. [1]  We returned to our shared research vision: to devise a model for software curation strategies to assist research outcomes that rely on the creation, use, reuse, and study of software.

Statement of Research Objectives and Working Definitions

Given the preponderance of definitions for curation and the wide-ranging implications of curating for different purposes and audiences, we thought it would be essential for us to identify and make clear our particular interests.  We developed the following statement to best describe our goals and objectives:

Libraries and archives are increasingly tasked with responsibilities related to the effective long-term preservation and curation of software.  The purpose of our work is to investigate and make recommendations for strategies that institutions can adopt for managing software as complex digital objects across generations of technology.

We also developed the following working definition of software curation for use in our research:

“Software curation encompasses the active practices related to the creation, acquisition, appraisal and selection, description, transformation, preservation, storage, and dissemination/access/reuse of software over short- and long- periods of time.”

What’s Next

The next phase of our research involves formalizing our research approach through the evaluation, selection, and application of relevant models (such as the OAIS Reference Model) and ontologies (such as the SWO). We are also developing different scenarios to establish the roles and responsibilities bound up in software creation, use, and reuse. In addition to reporting on the status of our project, you can expect to read blog posts about both the philosophical and practical implications of curating software in an academic research library setting.

Notes

[1] In the seminal collection Authenticity in a digital environment, Abby Smith noted that “We have to intervene continually to keep digital files alive. We cannot put a digital file on a shelf and decide later about preservation intervention. Storage means active intervention.” See: Abby Smith (2000) “Authenticity in Perspective  Authenticity in a digital environment. Washington, D.C: Council on Library and Information Resources.


Dec 13, 1:54pm

Zachary Lizee  who is a Graduate Research Intern in the program on information science, reflects on his investigations into information standards, and suggests how  libraries can reach beyond local instruction on digital literacy to scaleable education on to catalyze information citizenship.

21st century Libraries, Standards Education and Socially Responsible Information Seeking Behavior

Standards and standards development frame, guide, and normalize almost all areas of our lives.  Standards in IT govern interoperability between a variety of devices and platforms, standardized production of various machine parts allows uniform repair and reproduction, and standardization in fields like accounting, health care, or agriculture promotes best industry practices that emphasize safety and quality control.  Informational standards like OpenDocument allows storage and processing of digital information to be accessible by most types of software ensuring that the data is recoverable in the future.[1]  Standards reflect the shared values, aspirations, and responsibilities we as a society project upon each other and our world.

Engineering and other innovative entrepreneurial fields need to have awareness aboutinformation standards and standards development to ensure that the results of research, design, and development in these areas have the most positive net outcome for our world at large, as illustrated by the analysis of healthcare information standards by HIMSS, a professional organization that works to affect informational standards in the healthcare IT field:

In healthcare, standards provide a common language and set of expectations that enable interoperability between systems and/or devices. Ideally, data exchange schema and standards should permit data to be shared between clinician, lab, hospital, pharmacy, and patient regardless of application or application vendor in order to improve healthcare delivery. [2]

As critical issues regarding information privacy quickly increase, standard development organizations and interested stakeholders take an active interest in creating and maintaining standards to regulate how personal data is stored, transferred, and used, which has both public interests and regulation by legal frameworks in mind.[3]

Libraries have traditionally been centers of expertise/access of information collection, curation, dissemination, and instruction.  And the standards around how digital information is produced, used, governed, and transmitted are rapidly evolving with new technologies.[4]  Libraries are participating in the processes of generating information standards to ensure that patrons can freely and safely access information.  For instance, the National Information Standards Organization is developing informational standards to address patron privacy issues in library data management systems:

The NISO Privacy Principles, available at http://www.niso.org/topics/tl/patron_ privacy/, set forth a core set of guidelines by which libraries, systems providers and publishers can foster respect for patron privacy throughout their operations.  The Preamble of the Principles notes that, ‘Certain personal data are often required in order for digital systems to deliver information, particularly subscribed content.’ Additionally, user activity data can provide useful insights on how to improve collections and services. However, the gathering, storage, and use of these data must respect the trust users place in libraries and their partners. There are ways to address these operational needs while also respecting the user’s rights and expectations of privacy.[5]

This effort by NISO (which has librarians on the steering committee) illustrates how libraries engage in outreach and advocacy that is also in concert with the ALA’s Code of Ethics, which states that libraries have the duty to protect patron’s rights to privacy and confidentiality regarding information seeking behavior.  Libraries and librarians have a long tradition of engaging in social responsibility for their patrons and community at large.

Although libraries are sometimes involved, most information standards are created by engineers working in corporate settings, or are considerably influenced by the development of products that become the model.  Most students leave the university without understanding what standards are, how they are developed, and what potential social and political ramifications advancements in the engineering field can have on our world.[6]

There is a trend in the academic and professional communities to foster greater understanding about what standards are, why they are important, and how they relate to influencing and shaping our world.[7]  Understanding the relevance of standards will be an asset that employers in the engineering fields will value and look for.  Keeping informed about the most current standards can drive innovation and increase the market value of an engineer’s research and design efforts.[8]

As informational hubs, libraries have a unique opportunity to participate in developing information literacy regarding standards and standards development.  By infusing philosophies regarding socially responsible research and innovation, using standards instruction as a vehicle, librarians can emphasize the net positive effect of standards and ethics awareness for the individual student and the world at large.

The emergence of MOOCs creates an opportunity for librarians to reach a large audience to instruct patrons in information literacy in a variety of subjects. MOOCs can have a number of advantages when it comes to being able to inform and instruct a large number of people from a variety of geographic locations and across a range of subject areas.[9]

For example, a subject specific librarian for an engineering department at a university could participate with engineering faculty in developing a MOOC that outlines the relative issues, facts, and procedures surrounding standards and standards development to aid the engineering faculty in instructing standards education.  Together, librarians and subject experts could  develop education on the roles that standards and socially responsible behavior factor into the field of engineering.

Students that learn early in their career why standards are an integral element in engineering and related fields have the potential to produce influential ideas, products, and programs that undoubtedly could have positive and constructive effects for society.  Engineering endeavors to design products, methodologies, and other technologies that can have a positive impact on our world.  Standards education in engineering fields can produce students who have a keen understanding of social awareness about human dignity, human justice, overall human welfare, and a sense of global responsibility.

Our world has a number of challenges: poverty, oppression, political and economic strife, environmental issues, and a host of many other dilemmas socially responsible engineers and innovators could address.  The impact of educating engineers and innovators about standards and socially responsible behavior can affect future corporate responsibility, ethical and humanitarian behavior, altruistic technical research and development, which in turn yields a net positive result for the individual, society, and the world.

Recommended Resources:

Notes:

[1] OASIS, “OASIS Open Document Format for Office Applications TC,” <https://www.oasis-open.org/ committees/tc_home.php?wg_abbrev=office>

[2] HIMSS, “Why do we need standards?,” <http://www.himss.org/library/interoperability-standards/why-do-we-need-standards>

[3] Murphy, Craig N. and JoAnne Yates, The International Organization for Standardization (ISO): Global governance through voluntary consensus, London and New York: Routledge, 2009.

[4] See Opening Standards: The Global Politics of Interoperability, edited by Laura DeNardis, Cambridge, Massachusetts: MIT Press, 2011.

[5] “NISO Releases a Set of Principles to Address Privacy of User Data in Library, Content-Provider, and Software-Supplier Systems,” NISO,  <http://www.niso.org/news/pr/view?item_key=678c44da628619119213955b867838b40b6a7d96>

[6] “IEEE Position Paper on the Role of Technical Standards in the Curriculum of Academic Programs in Engineering, Technology and Computing,” IEEE,  <https://www.ieee.org/education_careers/education/eab/position_statements.html>

[7] Northwestern Strategic Standards Management, <http://www.northwestern.edu/standards-management/>

[8] “Education about standards,” ISO, <http://www.iso.org/iso/home/about/training-technical-assistance/standards-in-education.htm>

[9] “MOOC Design and Delivery: Opportunities and Challenges,” Current Issues in Emerging ELearning, V.3, Issue 1,(2016) <http://scholarworks.umb.edu/ciee/?utm_source=scholarworks.umb.edu%2Fciee%2Fvol2%2Fiss1%2F6&utm_medium=PDF&utm_campaign=PDFCoverPages>


Dec 08, 11:51am

Dr. Anthony Scriffignano, who is SVP/Chief Data Scientist at Dun and Bradstreet, gave this talk on Making Decisions in a World Awash in Data: We’re going to need a different boat
as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, Scriffignano argues that the massive collection of ‘unstructured’ data enables a wide set of potential inferences about complex changing relationships.  At the same time, his talk notes that it is increasingly easy to gather sufficient information to take action — while lacking enough information to  form good judgement, and further understanding of the context in which data is collected and flows is essential to developing such good judgements.

Scriffignano summarizes his talk in the following abstract:

l explore some of the ways in which the massive availability of data is changing and the types of questions we must ask in the context of making business decisions.  Truth be told, nearly all organizations struggle to make sense out of the mounting data already within the enterprise.  At the same time, businesses, individuals, and governments continue to try to outpace one another, often in ways that are informed by newly-available data and technology, but just as often using that data and technology in alarmingly inappropriate or incomplete ways.  Multiple “solutions” exist to take data that is poorly understood, promising to derive meaning that is often transient at best.  A tremendous amount of “dark” innovation continues in the space of fraud and other bad behavior (e.g. cyber crime, cyber terrorism), highlighting that there are very real risks to taking a fast-follower strategy in making sense out of the ever-increasing amount of data available.  Tools and technologies can be very helpful or, as Scriffignano puts it, “they can accelerate the speed with which we hit the wall.”  Drawing on unstructured, highly dynamic sources of data, fascinating inference can be derived if we ask the right questions (and maybe use a bit of different math!).  This session will cover three main themes: The new normal (how the data around us continues to change), how are we reacting (bringing data science into the room), and the path ahead (creating a mindset in the organization that evolves).  Ultimately, what we learn is governed as much by the data available as by the questions we ask.  This talk, both relevant and occasionally irreverent, will explore some of the new ways data is being used to expose risk and opportunity and the skills we need to take advantage of a world awash in data.

This covers a broad scope, and Dr. Scriffignano expands  extensively on these and other issue in his blog  — which is well worth reading.  

Dr. Scriffignano’s talk raised a number of interesting provocations. The talk claims, for example that:

On data.

  • No data is real-time — there are always latencies in measurement, transmission, or analysis.
  • Most data is worthless — but there remains a tremendous number of useful signals in data that we don’t understand.
  • Eighty-five percent of data collected today is ‘unstructured’. And unstructured’ data is really data that has structure that we do not yet understand.

On using data.

  • Unstructured data has the potential to support many unanticipated inferences. An example (which Scriffiganno calls a “data-bubble) is a set of photographs of crowd-sourced photos of recurring events — one can find photos that are taken at different times but which show the same location from the same perspective. Despite being convenient samples, they permit new longitudinal comparisons from which one could extract signals of fashion, attention, technology use, attitude, etc. —  and big data collection has created qualitatively new opportunities for inference.
  • When collecting and curating data we need to pay close attention to decision-elasticity — how different would our information have to be to change our optimal action?  In designing a data curation strategy, one needs to weigh the opportunity costs of obtaining data and curating data, against the potential to affect decisions.
  • Increasingly, big data analysis raises ethical questions.  Some of these questions arise directly: what are ethical expectations on use of ‘new’ signals we discover that can be extracted from unstructured data?  Others arise through the algorithms we choose — how they introduce biases– and how do we even understand what algorithms do, especially as use of artificial intelligence grows? Scriffigano’s talk gives as an example of recent AI research in which two algorithms develop their own private encryption scheme.

This is directly relevant to the future of research, and the future of research libraries.  Research will increasingly rely on evidence sources of these types — and increasing need to access, discover and curate this evidence.  And our society will increasingly be shaped by this information, and how we choose to engineer and govern collection and use of this information.  The private sector is pushing ahead fast in this area, and will no doubt generate many innovative data collections and algorithms.  Engagement from university scholars, researchers, and librarians is vital to ensure that society understands these new creations; is able to evaluate their reliability and bias; and has durable and equitable access to them to provide accountability and to support  important discoveries that are not easily monetized.  For those interested in this topic, — the  Program on Information Science has published reports and articles on big data inference and ethics.    


Oct 29, 10:46am

Rebecca Kennison, who is the Principal of K|N Consultants, the co-founder of the Open Access Network; and was was the founding director of the Center for Digital Research and Scholarship, gave this talk on Come Together Right Now: An Introduction To The Open Access Network as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, Kennison argues that current models of OA publishing based on cost-per-unit are neither scalable nor sustainable.  She further argues that a sustainable model must be based on continuing regular voluntary contributions from research universities.  

In her abstract, Kennison summarizes as follows:

Officially launched just over a year ago, the Open Access Network (OAN) offers a transformative, sustainable, and scalable model of open access (OA) publishing and preservation that encourages partnerships among scholarly societies, research libraries, and other partners (e.g., academic publishers, university presses, collaborative e-archives) who share a common mission to support the creation and distribution of open research and scholarship and to encourage more affordable education, which can be a direct outcome of OA publishing. Our ultimate goal is to develop a collective funding approach that is fair and open and that fully sustains the infrastructure needed to support the full life-cycle for communication of the scholarly record, including new and evolving forms of research output. Simply put, we intend to Make Knowledge Public.

Kennison’s talk summarizes the argument in her 2014 paper with Lisa Norberg : A Scalable and Sustainable Approach to Open Access Publishing and Archiving for Humanities and Social Sciences. Those intrigued by these arguments may find a wealth of detail in the full paper.

Kennison argues that this form of network would offer value to three groups  of stakeholders in general:

  • For institutions and libraries, to advance research and scholarship, lower the cost of education, and support lifelong learning.
  • For scholarly societies, university presses: ensure revenue, sustain operations, and support innovation.
  • For individuals, foundations, corporations: provide wide access to research and scholarship to address societal challenges, support education, and grow the economy.

The  Program on Information Science has previously written on the information economics of the commons in Information Wants Someone Else to Pay For It.  Two critical questions posed by an economic analysis of the OAN are first: What is the added value to the individual contributor that they would not obtain unless they individually contribute? (Note that this is different from the group value above — since any stakeholder gets these values if the OAN exists, whether or not they contribute to it.)  Second, under what conditions does the approach lead to the right amount of information being produced?  For example both market-based solutions and pure-altruistic solutions to producing knowledge outputs yield something — they just don’t yield anything close to the social optimum level of knowledge production and use. What reasons do we have to believe that the fee-structure of the OAN comes closer?

In addition, Kennison discussed the field of linguistics as a prototype. It is a comparatively small discipline (1000’s of researchers) and the output is focused in approximately 60 journals.  Notably, a number of high-profile departments recently changed their tenure and promotion policy to recognize the OA journal Glossa as the equivalent of top journal Lingua, when the former’s board departed in protest.

This is a particularly interesting example because successful management of knowledge commons is often built around coherent communities. For commons management to work — as Ostrom’s work shows, behavior must be reliably observable within a community, the community must be able to impose its own effective and graduated sanctions, and determine its own rules for doing so.  I conjecture that particular technical and/or policy-based solutions to knowledge commons management  (let’s call these “knowledge infrastructure”) have the potential to scale when three conditions hold: (1) the knowledge infrastructure addresses a vertical community that includes an interdependent set of producers and consumers of knowledge, (2) the approach provides substantial incentives for individuals in that vertical community to contribute while (a) providing public goods to both that community and (b) to a larger community; and (3) the approach is built upon community-specific extensions of more general-purpose infrastructure.


Oct 24, 11:59am

Gary Price, who is chief editor of InfoDocket, contributing editor of Search Engine Land, co-founder of Full Text Reports and who has worked with internet search firms and library systems developers alike, gave this talk on Issues in Curating the Open Web at Scale as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, Price argued that the libraries should be more aggressively engaging with content on the open web (i.e. stuff you find through Google). He further argued that the traditional methods and knowledge used by librarians to curate traditional print collections may be usefully applied to open web content.  

Price has been a leader in developing and observing web discovery — as a director of ask.com, and as author of the first book to cogently summarize the limitations of web search: The Invisible Web: Uncovering Information Sources Search Engines Can’t See.  The talk gave a whirlwind tour of the history of curation of the open web, and noted the many early efforts aimed at curating resource directories that withered away with the ascent of Google.

In his abstract, Price summarizes as follows:

Much of the web remains invisible: resources are undescribed, unindexed or simply buried —  as many people rarely look past the first page of Google searches or are unavailable from traditional library resources. At the same time many traditional library databases pay little attention to quality content from credible sources accessible on the open web.

How do we build collections of quality open-web resources (i.e. documents, specialty databases, and multimedia) and make them accessible to individuals and user groups when and where they need it?

This talk reflects on the emerging tools for systematic programmatic curation; the legal challenges to open-web curation; long term access issues, and the historical challenges to building sustainable communities of curation.

Across his talk, Price stressed three arguments.

First, that much of the web remains invisible: Many databases and structure information sources are not indexed by Google. And although increasing amounts of structured information is indexed — most is behaviorally invisible — since the vast majority of people do not look beyond the first page of Google results.  Further, behavioral invisibility of information is exacerbated by the decreasing support for complex search operators in open web search engines.

Second, Price argued that library curation of the open web would add value: Curation would  make the invisible web visible; counteract gaming of results; and identify credible sources.  

Third, Price argued that a machine-assisted approach can be an effective strategy. He described how tools such as website watcher, archiveit, RSS aggregators, social media monitoring services, and content alerting services can be brought together by a trained curator, to develop continually updated collections of content that are of interest to targeted communities of practice. He argued that familiarity with these tools and approaches should be part of the Librarian’s toolkit – especially for those in liaison roles.

Similar tools are discussed in the courses we teach on professional reputation management — and I’ve found a number (particularly the latter three) useful as an individual professional.  More generally, I speculate that curation of the open web will be a larger part of the library mission — as we have argued in the 2015 National Agenda for Digital Stewardship, organizations rely on more information than they can directly steward.  The central problem is coordinating stakeholders around stewarding collection from which they derive common value.  This remains a deep, and unsolved problem, however, efforts such as The Keeper’s Registry and collaborations such as the International Internet Preservation Society (IIPC) and the  National Digital Stewardship Alliance (NDSA)  are making progress in this area.


Aug 24, 10:01pm

Infrastructure and practices for data citation have made substantial progress over the last decade. This increases the potential rewards for data publication and reproducible science, however overall incentives remain relatively weak for many researchers.

This blog post summarizes a presentation given at the National Academies of Sciences as part of  Data Citation Workshop: Developing Policy And Practice.  The slides from the talk are embedded below:

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


Principles

Academic researchers as a class are drawn to research and scholarship through an interest in puzzle-solving, but they are also substantially incented by recognition and money.  Typically, these incentives are shaped and channeled through the processes and institutions of tenure and review; publication; grant, awards and prizes; industry consulting; and professional collaboration and mentoring. [1]

Citations have been described as academic “currency”, and while this is not literally true, they are a particularly visible form of recognition in the academy, and increasingly tied to monetary incentives as well. [2] Thus rules, norms, and institutions that affect citation practices have a substantial potential to change incentives.

When effort is invisible it is apt to be undervalued. Data has been the “dark matter” of the scholarly ecosystem — data citation aims to make the role of data visible.  While the citation of data is not entirely novel, there has been a concerted effort across researchers, funders, and publishers over approximately the last decade to reengineer data citation standards and tools to create more rational incentives to create reusable and reproducible research. [3]  In more formal terms, the proximate aim of the current data citation movement is to make transparent the linkages between research claims, the evidence base on which these claims are based; and the contributors who are responsible for that evidence base. The longer term aim is to shift the equilibrium of incentives so that building the common scientific evidence base is rewarded in proportion to its benefit to the overall scientific community.

Progress

There has been notable progress in standards, policies, and tools for data citation since the ‘bad old days’ of 2007, which Gary King and I grimly characterized at the time [4]:

How much slower would scientific progress be if the near universal standards for scholarly citation of articles and books had never been developed? Suppose shortly after publication only some printed works could be reliably found by other scholars; or if researchers were only permitted to read an article if they first committed not to criticize it, or were required to coauthor with the original author any work that built on the original. How many discoveries would never have been made if the titles of books and articles in libraries changed unpredictably, with no link back to the old title; if printed works existed in different libraries under different titles; if researchers routinely redistributed modified versions of other authors’ works without changing the title or author listed; or if publishing new editions of books meant that earlier editions were destroyed? …

Unfortunately, no such universal standards exist for citing quantitative data, and so all the problems listed above exist now. Practices vary from field to field, archive to archive, and often from article to article. The data cited may no longer exist, may not be available publicly, or may have never been held by anyone but the investigator. Data listed as available from the author are unlikely to be available for long and will not be available after the author retires or dies. Sometimes URLs are given, but they often do not persist. In recent years, a major archive renumbered all its acquisitions, rendering all citations to data it held invalid; identical data was distributed in different archives with different identifiers; data sets have been expanded or corrected and the old data, on which prior literature is based, was destroyed or renumbered and so is inaccessible; and modified versions of data are routinely distributed under the same name, without any standard for versioning. Copyeditors have no fixed rules, and often no rules whatsoever. Data are sometimes listed in the bibliography, sometimes in the text, sometimes not at all, and rarely with enough information to guarantee future access to the identical data set. Replicating published tables and figures even without having to rerun the original experiment, is often difficult or impossible.

A decade ago, while some publishers had data transparency policies — they were routinely honored in the breach. Now, a number of high profile journals both require that authors cite or include the data on which their publications rest; and have mechanisms to enforce this. PLOS is a notable example — its Data Availability Statement [5] states not only that data should be shared, but that articles should provide the persistent identifiers of shared data, and that these should resolve to well-known repositories.

A decade ago, the only major funder that had an organization-wide data sharing policy was NIH [6] — and this policy had notable limitations —  it was limited to large grants, and the resource sharing statements it required were brief, not peer reviewed, and not monitored. Today, as Jerry Sheehan noted in his presentation on Increasing Access to the Results of Federally Funded Scientific Research: Data Management and Citation, almost all federal support for research now complies with the Holdren memo, which requires policies and data management plans “describing how they will provide for long-term preservation of, and access to, scientific data”. [7]  A number of foundation funders have adopted similar policies. Furthermore as, panelist Patricia Knezek noted, data management plans are now part of the peer review process at the National Science Foundation, and datasets may be included in the biosketches that are part of the funding application process.   

A decade ago, few journals published replication data, and no high-profile journals existed that published data.  Over the last several years, the number of data journals has increased, and Nature Research launched Scientific Data — which has substantially raised the visibility of data publications.

A decade ago, tools for data citation were non-existent, and the infrastructure for general open data sharing outside of specific community collections was essentially limited to ICPSR’s publication related archive [8] and Harvard’s’ Virtual Data Center [9] (which later became the Dataverse Network). Today as panelists throughout the day noted [10] infrastructure such as CKAN, Figshare, and close to a dozen Dataverse-based archives accept open data from any field; [11] there are rich public directories of archives such as RE3data; and large data citation indices such Datacite, and the TR Data Citation Index, enable data citations to be discovered and evaluated. [12]

These new tools are critical for creating data sharing incentives and rewards.  They allow data to be shared and discovered for reuse, reuse to be attributed, and that attribution to be incorporated into metrics of scholarly productivity and impact. Moreover much of this infrastructure exists in large part because they received substantial startup support from the research funding community.

Perforations

While open repositories and data citation indices enable researchers to more effectively get credit for data that is cited directly, and there is limited evidence that sharing research data is associated with higher citation rates, data sharing and citation remains quite limited. [13] As Marsha McNutt notes in her talk on Data Sharing: Some Cultural Perspectives, progress likely depends at least as much on cultural and organizational change, as on technical advances.   

Formally, the indexing of data citations enables citations to data to contribute to a researcher’s H-index, and other measures of scholarly productivity. As speaker Dianne Martin noted in the panel on Reward/Incentive Structures that her institution  (George Washington University) had begun to recognize data sharing and citation in the tenure and review process.

Despite the substantial progress over the last decade, there is little evidence that the incorporation of data citation and publication into tenure and review is yet either systematic or widespread.  Overall, positive incentives for citing data still appear to remain relatively weak:

  1. It remains the case that data is often used without being cited.[14]
  2. Even where data is cited — most individual data publications  (with notable exceptions primarily in the category of large community-databases) are neither in high impact publications nor highly cited.  Since scientists achieve significant recognition most often through publication in “high-impact” journals, and increasingly through publishing articles that are highly-cited — devoting effort to data publishing has a high opportunity cost.
  3. Even when cited, publishing one’s data is often perceived as increasing the likelihood that others will “leapfrog” your research, and publish high-impact articles with priority. Since scientific recognition relies strongly on priority of publication, this risk is a disincentive.
  4. While data citation and publication likely strengthens reproducibility, it also makes it easier for others to criticize published work. In the absence of strong positive rewards for reproducible research, this risk may be a disincentive overall.  

 

Funders and policy-makers have the potential to do more to strengthen positive incentives. Funders should support for mechanisms to assess and assign “transitive” credit, which would provides some share of the credit for publications to the other data and publications on which they would rely. [15] And funders and policy makers should support strong positive incentives for reproducible research — such as funding, and explicit recognition. [16]

Thus far, much of the efforts by funders, who are key stakeholders, focus on compliance. And in general, compliance has substantial limits as a design principle:

  • Compliance generates incentives to follow a stated rule — but not generally to go beyond to promote the values that motivated the rule.
  • Actors still need resources to comply, and as Chandler and other speakers on the panel on Supporting Organizations Facing The Challenges of Data Citation, compliance with data sharing is often viewed as an unfunded mandate.
  • Compliance-based incentives are prone to failure where the standards for compliance are ambiguous or conflicting.
  • Further, actors have incentives to comply with rules only when they have an expectation that behavior can be monitored, that the rule-maker will monitor behavior, and that violations of the rules will be penalized.
  • Moreover, external incentives, such as compliance, can displace existing internal motivations and social norms [17] — yielding a reduction in the desired behavior. Thus we should expect to promote the value the rule supports.

Journals have increased the monitoring of data transparency and sharing — primarily through policies like PLOS’s that require the author to supply before publication a replication data set and/or an explicit data citation or persistent identifiers that resolves to data in a well-known repository. This appears to be substantially increasing compliance with journal policies that had been on the books for over a decade.

However, neither universities nor funders are routinely auditing or monitoring compliance with data management plans.  As panelist Patricia Knezek emphasizes,  there are many questions about how funders will monitor compliance, how to incent compliance after the award is complete, and regarding uncertainties about the division of responsibility for compliance between the funded institution and the funded investigator.  Further, as noted in the panelists discussion with the workshop audience, data management plans for funded research made available to the public along with the abstracts creates a barrier to community-based monitoring and norms; scientists in the federal government are not currently subject to the same data sharing and management requirements as scientists in academia; and there is a need to support ‘convening’ organizations such as FORCE11, and the Research Data Alliance to bring multiple stakeholders to the table to align strategies on incentives and compliance .

Finally, as Cliff Lynch noted in the final discussion session of the workshop, compliance with data sharing requirements often comes into conflict with confidentiality requirements for the protection of data obtained from individuals and businesses, especially in the social, behavioral, and health sciences.  This is not a fundamental conflict  — it is possible to enable access to data without any intellectual property restrictions while still maintaining privacy. [18] However, absent common policies and legal instruments for intellectually-open but personally-confidential data, confidentiality requirements are a barrier (or sometimes an excuse) to open data.

References

[1] See for a review: Stephan PE. How economics shapes science. Cambridge, MA: Harvard University Press; 2012 Jan 15.

[2] Cronin B. The citation process. The role and significance of citations in scientific communication. London: Taylor Graham, 1984. 1984;1.

[3] Altman M, Crosas M. The evolution of data citation: from principles to implementation. IAssist Quarterly. 2013 Mar 1;37(1-4):62-70.

[4] Altman, Micah, and Gary King. “A proposed standard for the scholarly citation of quantitative data.” D-lib Magazine 13.3/4 (2007).

[5] See: http://journals.plos.org/plosone/s/data-availability

[6] See Final NIH Statement on Sharing Research Data, 2003, NOT-OD-03-032. Available from: https://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html

[7]  Holdren, J.P. 2013, “Increasing Access to the Results of Federally Funded Scientific Research “, OSTP. Available from: https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf

[8]  King, Gary. “Replication, replication.” PS: Political Science & Politics 28.03 (1995): 444-452.

[9]  Altman M. Open source software for Libraries: from Greenstone to the Virtual Data Center and beyond. IASSIST Quarterly. 2002;25.

[10]  See particularly the presentation and discussion on Tools and Connections; Supporting Organizations Facing The Challenges of Data Citation, and Reward/Incentive Structures.  

[11] See http://dataverse.org/ , http://ckan.org/ , https://figshare.com/

[12] See http://www.re3data.org/ ,https://www.datacite.org/ , http://wokinfo.com/products_tools/multidisciplinary/dci/

[13]  Borgman CL. The conundrum of sharing research data. Journal of the American Society for Information Science and Technology. 2012 Jun 1;63(6):1059-78.

[14]  Read, Kevin B., Jerry R. Sheehan, Michael F. Huerta, Lou S. Knecht, James G. Mork, and Betsy L. Humphreys. “Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study.” PloS one10, no. 7 (2015): e0132735.

[15] See Katz, D.S., Choi, S.C.T., Wilkins-Diehr, N., Hong, N.C., Venters, C.C., Howison, J., Seinstra, F., Jones, M., Cranston, K., Clune, T.L. and de Val-Borro, M., 2015. Report on the second workshop on sustainable software for science: Practice and experiences (WSSSPE2). arXiv preprint arXiv:1507.01715.

[16]  See Nosek BA, Spies JR, Motyl M. Scientific utopia II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science. 2012 Nov 1;7(6):615-31., Brandon, Alec, and John A. List. “Markets for replication.” Proceedings of the National Academy of Sciences 112.50 (2015): 15267-15268.

[17] Gneezy, U. and Rustichini, A., 2000. A fine is a price, . J. Legal Studies, 29,


Jul 21, 11:57am

Elon University’s Imaging the Internet Center aims to provide insights into emerging network innovations, global development, dynamics, diffusion, and governance. For over a decade, they have been collaborating with the Pew Research center to conduct regular expert surveys to support predictions in these areas.  

Experts responding to the last survey, conducted in 2014, yielded over twenty themes for the next decade, including:

  • “The spread of the Internet will enhance global connectivity that fosters more planetary relationships and less ignorance.”
  • “The Internet of Things, artificial intelligence, and big data will make people more aware of their world and their own behavior.”
  • “The spread of the ‘Ubernet’ will diminish the meaning of borders, and new ‘nations’ of those with shared interests may emerge and exist beyond the capacity of current nation-states to control. “
  • “Dangerous divides between haves and have-nots may expand, resulting in resentment and possible violence.”
  • “Abuses and abusers will ‘evolve and scale.”
  • “Most people are not yet noticing the profound changes today’s communications networks are already bringing about; these networks will be even more disruptive in the future.”

 

The next wave of the survey is underway, and I was asked to contribute predictions for change in the next decade, as an expert respondent.  I look forward to seeing the cumulative results of the survey, which should emerge next year. In the interim, I share my formative responses to questions about the next decade:

… the next decade of public discourse… will public discourse online become more or less shaped by bad actors?

The design of current social media systems is heavily influenced by a funding model based on advertisement revenue. Consequences of this have been that these systems emphasize “viral” communication that allows a single communicator to reach a large but interested audience, and devalue privacy, but are not designed to enable large scale collaboration and discourse.  

While the advertising model remains firmly in place, there hasbeen increasing public attention to privacy and to the potential for manipulating attitudes enabled by algorithmic curation.  Yet, I am optimistic. I am optimistic that in the next decade social media systems will give participants more authentic control over sharing their information and will begin to facilitate deliberation at scale.

… the next decade of online education, and credentialing… which skills will be most difficult to teach at scale?

Over the last fifteen years we have seen increasing success in making open course content available, followed by success teaching classes on-line at scale (e.g. Coursera, EdX). The next part of this progression will be online credentialing — last year, Starbucks’s partnership with ASU to provide long numbers of its employees with the opportunity to earn a full degree online is indicative of this shift.

Progress in on-line credentialing will be slower than progress in online delivery, because of the need to comply with or modify regulation, establish reputation, and overcome entrenched institutional interests in residential education. Notwithstanding, I am optimistic we will see substantial progress in the next decade — including more rigorous and widely accepted competency-based credentialing.

Given the increased rate of technical change, and the regular disruptions this creates in established industries — the most important skills for workforces in developed countries are those that support adaptability, and which enable workers to engage with new technologies (and especially information and communication technologies) and to effectively collaborate in different organizational structures.

While specific technical skills are well-fitted to a self-directed experience, some important skills — particularly metacognition, collaboration, and “soft” (emotional/social intelligences) skills  — which are particularly important for long-term success, require individualized guidance, (currently) a human instructor in the loop, and the opportunity to interact richly with other learners.


… the next decade of algorithms —  will the net overall effect be positive or negative?

Algorithms are defined essentially as mathematical tools designed to solve problems. Generally, improvements in problem solving tools — especially in the mathematical and computational fields have yielded huge benefits in science, technology, and health, and will most likely continue to do so.

The key policy question is really how we will choose to hold government and corporate actors responsible for the choices that they delegate to algorithms.  There is increasing understanding that each choice of algorithms embody a specific set of choices over what criteria are important to “solving” a problem, and what can be ignored. To incent better choices in algorithms will likely require actors using them to provide more transparency, to explicitly design algorithms with privacy and fairness in mind, and to holding actors who use algorithms meaningfully responsible for their consequences.


… the next decade of trust —  will people disengage with social networks, the internet, and the Internet of Things?

It appears very likely that because of network effects people’s general use of these systems will continue to increase — whether or not the systems themselves actually become more trustworthy. The value of online markets (etc.) are often a growing function of their size — which creates to a form of natural monopoly making these systems increasingly valuable, ubiquitous, and unavoidable.

The trustworthiness of this systems remains in doubt. It could be greatly improved by providing users with more transparency, control, and accountability over such systems. Technologies such as secure multi-party computing, functional encryption, unalterable blockchain ledgers, and differential privacy have great potential to strengthen systems — but so far the incentives to deploy them at wide scale are missing.

Structurally, many of the same forces will drive this as drive use of online networks. It appears very likely that because of network effects people’s general use of connected devices will continue to increase — whether or not the systems themselves are actually become more trustworthy.

The network of IoT is at an earlier stage than that of social networks — and there is less immediate value returned, and not yet a dominant “network” of these devices. It may take some time for a valuable network to emerge, and so the incentives to use IoT seem so far small for the end-consumer, while the security issues loom large, given the current lack of attention to systematic security engineering in design and implementation of these systems. (The lack of visibility of security reduces the incentives for such design). However, it seems likely that within the next decade the value of connected devices will become sufficient to drive people to use, regardless of the security risks, which may remain serious, but are often less immediately visible.

Reflecting on my own answers, I suspect I am reacting more as a “hedgehog” than as a “fox” — and thus am quite likely to be wrong (on this, see Phillip Tetlock’s excellent book on Expert Political Judgements).  I will in my defense recall the phrase that, physicist Dennis Gabor once wrote, “we cannot predict the future, but we can invent it.” — this is very much in the spirit of MIT. And as we argue in Information Wants Someone Else to Pay for It, the future of the scholarly communications and the information commons will be a happier one if libraries take their part in inventing it.

* This quote is most often attributed to Yogi Berra, but he denied it, at least in e-mail correspondence with me in 1997. It has also been attributed (with disputation) to Woody Allen,, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White.


Pages