Dr. Anthony Scriffignano, who is SVP/Chief Data Scientist at Dun and Bradstreet, gave this talk on Making Decisions in a World Awash in Data: We’re going to need a different boat
as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Scriffignano argues that the massive collection of ‘unstructured’ data enables a wide set of potential inferences about complex changing relationships. At the same time, his talk notes that it is increasingly easy to gather sufficient information to take action — while lacking enough information to form good judgement, and further understanding of the context in which data is collected and flows is essential to developing such good judgements.
Scriffignano summarizes his talk in the following abstract:
l explore some of the ways in which the massive availability of data is changing and the types of questions we must ask in the context of making business decisions. Truth be told, nearly all organizations struggle to make sense out of the mounting data already within the enterprise. At the same time, businesses, individuals, and governments continue to try to outpace one another, often in ways that are informed by newly-available data and technology, but just as often using that data and technology in alarmingly inappropriate or incomplete ways. Multiple “solutions” exist to take data that is poorly understood, promising to derive meaning that is often transient at best. A tremendous amount of “dark” innovation continues in the space of fraud and other bad behavior (e.g. cyber crime, cyber terrorism), highlighting that there are very real risks to taking a fast-follower strategy in making sense out of the ever-increasing amount of data available. Tools and technologies can be very helpful or, as Scriffignano puts it, “they can accelerate the speed with which we hit the wall.” Drawing on unstructured, highly dynamic sources of data, fascinating inference can be derived if we ask the right questions (and maybe use a bit of different math!). This session will cover three main themes: The new normal (how the data around us continues to change), how are we reacting (bringing data science into the room), and the path ahead (creating a mindset in the organization that evolves). Ultimately, what we learn is governed as much by the data available as by the questions we ask. This talk, both relevant and occasionally irreverent, will explore some of the new ways data is being used to expose risk and opportunity and the skills we need to take advantage of a world awash in data.
This covers a broad scope, and Dr. Scriffignano expands extensively on these and other issue in his blog — which is well worth reading.
Dr. Scriffignano’s talk raised a number of interesting provocations. The talk claims, for example that:
- No data is real-time — there are always latencies in measurement, transmission, or analysis.
- Most data is worthless — but there remains a tremendous number of useful signals in data that we don’t understand.
- Eighty-five percent of data collected today is ‘unstructured’. And unstructured’ data is really data that has structure that we do not yet understand.
On using data.
- Unstructured data has the potential to support many unanticipated inferences. An example (which Scriffiganno calls a “data-bubble) is a set of photographs of crowd-sourced photos of recurring events — one can find photos that are taken at different times but which show the same location from the same perspective. Despite being convenient samples, they permit new longitudinal comparisons from which one could extract signals of fashion, attention, technology use, attitude, etc. — and big data collection has created qualitatively new opportunities for inference.
- When collecting and curating data we need to pay close attention to decision-elasticity — how different would our information have to be to change our optimal action? In designing a data curation strategy, one needs to weigh the opportunity costs of obtaining data and curating data, against the potential to affect decisions.
- Increasingly, big data analysis raises ethical questions. Some of these questions arise directly: what are ethical expectations on use of ‘new’ signals we discover that can be extracted from unstructured data? Others arise through the algorithms we choose — how they introduce biases– and how do we even understand what algorithms do, especially as use of artificial intelligence grows? Scriffigano’s talk gives as an example of recent AI research in which two algorithms develop their own private encryption scheme.
This is directly relevant to the future of research, and the future of research libraries. Research will increasingly rely on evidence sources of these types — and increasing need to access, discover and curate this evidence. And our society will increasingly be shaped by this information, and how we choose to engineer and govern collection and use of this information. The private sector is pushing ahead fast in this area, and will no doubt generate many innovative data collections and algorithms. Engagement from university scholars, researchers, and librarians is vital to ensure that society understands these new creations; is able to evaluate their reliability and bias; and has durable and equitable access to them to provide accountability and to support important discoveries that are not easily monetized. For those interested in this topic, — the Program on Information Science has published reports and articles on big data inference and ethics.