How do we communicate research findings based on new and unconventional data sources? I explore some challenges to doing this, and strategies to overcome them with a 2 by 2 matrix that considers the sophistication of the audience, and how surprising the findings are.
John Maynard Keynes once said, in relation to investors' aversion to 'step out of the box', that “worldly wisdom dictates that is better for the reputation to fail conventionally than to succeed unconventionally.” [1] There are some similarities between their situation and the risks faced by a researcher considering new sources of data such as those falling under the vague umbrella of big data.[2] In this case, data un-conventionalism carries risks such as:
These are important challenges for us. We are involved in several projects that use new sources of data with the goal of generating new knowledge relevant for policymakers, entrepreneurs and managers.[3] They include:
…among others. The fact that none of these projects would have been practical until recently underscores the fresh opportunities opened up by new data sources. However, using these data sources also presents risks – such as biases and measurement errors like those highlighted in Tim Harford’s recent Financial Times article (see also Kate Crawford’s writing on this, and this blog by Nesta colleague Andrew Whitby).
How do we communicate the findings of this research? How can we simultaneously convey the novelty and value of our findings, caveat our findings, and avoid slipping into trivialities?
These old problems can become intensified by quality concerns over new data sources. Here I map them - together with potential strategies to address them - using that trusty complexity-reduction tool, the two-by-two matrix.
The dimensions of this matrix capture two features of a situation where we are communicating research findings based on new and unconventional sources of data.[4] They are thus:
Let’s go through each of these quadrants in turn, starting with...
The matrix above doesn’t consider what’s the purpose of the research we are communicating – the goals and challenges for maps and early warning systems aimed at ‘making the invisible visible’ will be different from those for impact evaluations, where establishing causality is more important. Perhaps we could add another dimension to the matrix and turn it into a cube? Next time.
For now, suffice to highlight one strategy that can help regardless of the quadrant we're at: ensuring that we understand the data we are using and its limitations, being transparent about our methods, and being able to set our findings in the context of wider literatures and experiences (including the domain knowledge of people in the field). In other words, the kind of stuff taught in Research Methods 101. Hardly unconventional stuff, but simply what’s needed if we are going to create long-lasting value from these new, exciting and (for now) unconventional data sources.
(We’ll be sharing our experience + findings + learnings with these data sources in future posts. If you have any questions, use the comment box below, or contact me at [email protected] or @JMateosGarcia).
(Image= Gzthermal by Scott Schiller).
[1] I read this quote in Peter Bernstein wonderful history of risk, ‘Against the Gods.’
[2] Say, found data from the web, which is not big in volume but varied in structure, and has velocity in that it’s recent.
[3] Even when we use traditional data collection methods, like surveys, the fact that we are looking at relatively new and poorly understood phenomena, like for example the adoption of data practices in our datavores stream of work, and the small sample sizes involved in this work means that the risks identified above still obtain to some extent.
[4] Some caveats: of course this is a simplification of reality. Most research will fall somewhere in a continuum between our two dimensions.
[5] To be sure, research can have multiple audiences and be communicated at multiple levels.
[6] For simplicity, let’s assume that all the findings of the research are expected or unexpected at the same time. If they aren’t, just consider each finding independently from each other.