There's been a lot of interest in this idea of collective wisdom recently, and enthusiasm for data-driven forecasting - but how do we get better at predicting the future?
Off the back of 18 years cataloguing tens of thousands of experts' forecasts on political events, Philip Tetlock famously denounced the predictive powers of political scientists, policy wonks and media pundits as little better than chance. But rather than going on to dismiss such forecasting as impossible, Tetlock put his efforts into working out how to do better.
Tetlock co-leads the Good Judgment Project, a large-scale, US-government-funded research programme into better forecasting. It's invited thousands of people (though mostly men, curiously) to predict events that matter to the intelligence community – the collapse of a regime, an election result, a country defaulting, that sort of thing.
The headline finding so far has been that some people really are good at foresight, and forecasting skills can be honed. But the bulk of the work has been in experimenting with ways to collect individual judgments together and produce crowd forecasts that are more reliable than lone experts.
There's been a lot of interest in this idea of collective wisdom recently, with Good Judgment at the forefront. Prediction markets, in which people bet against each other on the likelihood of events have, despite some hiccups, proliferated.
Alongside much enthusiasm for data-driven forecasting – scraping the internet or company data for early signals of future trends – these quantitative ways of harnessing human judgment show great promise.
But questions remain about how useful these kinds of predictions are, in which contexts they work and don't, and whether some of the experimental findings of the Good Judgment Project (and others) can be helpful in the real world.
I'm doing some research into prediction markets, tournaments and related methods to see what we can learn from them about how to better predict the future – be it policy-relevant geopolitical and macroeconomic events, technological disruption or short-term internal forecasting that can help companies make better strategic decisions.
I'll produce a Nesta working paper in January. Here are some thoughts so far:
What it takes isn't anything surprising. Better forecasters are 'actively open-minded': self-critical, open to contradictory evidence and willing to revise their beliefs when new evidence comes in. But what's encouraging is that a very small amount of training goes a long way to improve these skills: at Good Judgment, training in how to overcome bias improves a forecaster's accuracy by 10-15 per cent. That's impressive given that it's just an hour's training (some participants go on to spend about 100 hours forecasting a year). Practice and feedback helps a huge amount, too.
There's no single trick to superior forecasting. Rather it's a combination of things – including training, putting forecasters in teams, aggregation and weighting methods – that each increase accuracy by 10 per cent or so. But these add up – put it all together and you get very good predictions.
Prediction markets are motivated by the idea that financial markets reflect differing predictions about the future returns of an asset, settling on a price that reveals a collective judgment. Markets are interesting because, though sometimes very wrong, they're often smarter than the individuals within – investors rarely out-performing the market over the long-term.
Prediction markets establish markets solely for this predictive quality, with traders buying and selling contracts relating to a specific event. Interpreting prices as probabilities, these markets have been found to be remarkably accurate at predicting things like elections.
"Better forecasters are 'actively open-minded': self-critical, open to contradictory evidence and willing to revise their beliefs when new evidence comes in"
Though the most well-known prediction market, InTrade, collapsed in spectacular, scandalous fashion in 2013, there are increasingly many markets letting people speculate on politics, sport, economics and entertainment (even in Bitcoin, for those adventurous enough to speculate with speculative currency).
Though all manner of caveats apply, their predictive record remains very good – or at least better than other methods: whilst polling in the run up to the Scottish referendum was all over the place, markets fared pretty well. Good Judgment has found its play-money markets to be highly effective, though less good at longer-term forecasts than other methods.
Many companies have set up markets - they're good at predicting whether a production deadline will be missed, for example. But most have been bad at implementing them – setting them up poorly and asking interesting but not useful questions, judging markets by activity but not accuracy, and not acting on the results. Play-money prediction markets have so far been more about the play than anything else.
Good Judgment's initial findings suggest that when teams share information and debate their forecasts, all do better. That's interesting, because huge amounts of social psychology going back to the 1970s have much to say about the biases and 'groupthink' that distorts group decisions. Groups tend to emphasise opinions that everyone shares rather than unearth the little bits of information spread amongst the group; they tend to get framed by the first things discussed; and fear for their reputation makes some people less willing to speak out on what they know.
That results in groups converging on bad judgments, and it's why many have argued for prediction markets' superiority, where information is shared indirectly and anonymously. But Good Judgment has found little evidence of those effects in a forecasting-focussed environment.
"Alongside much enthusiasm for data-driven forecasting, these quantitative ways of harnessing human judgment show great promise - but questions remain about how useful these kinds of predictions are"
This raises a practical point about learning from Good Judgment, which is that it's an artificial environment in which forecasters have no incentive but to produce the best forecasts they can. Though there's some reputational pressure (it's a competition, after all), no jobs are on the line.
Those unbiased dynamics and pure incentives will be hard to bring out of the lab and into government, think tanks and companies. And organisations can't easily set up dedicated forecasting tournaments as well-funded and well-resourced as the Good Judgment Project.
So it may be that whilst polling works best in the lab, prediction markets – in which groups 'share' information, through their trading behaviour rather than discussion – work better, and are easier to implement, in the real world. So far they've done well, but there's more we can learn about how to set them up properly – designing them right, and asking the right questions.
Prediction polls and markets disallow pundits' vague, weasely verbiage by asking yes-no questions on highly specific events. But there's a rigour-relevance trade-off: you want unambiguous questions, but need to balance this with the policy relevance, and usefulness, of the question.
It's a criticism Nassim Nicholas Taleb has made, calling prediction markets 'ludicrous': it is not that these tools don't predict black swans (of course they don't), but that you don't want to know simply whether there'll be a war, say, but also the magnitude and kind of war it will be.
Robin Hanson, an economist at George Mason and prediction market architect and advocate, meanwhile suggested to me that Good Judgment's questions aren't particularly useful because they're far from "actionable". He suggests more decision-relevant questions – such as the consequences of taking a certain action – should be asked.
"Whilst polling works best in the lab, prediction markets...work better, and are easier to implement, in the real world"
As Good Judgment nears its close, it's running some more speculative experiments: for example placing its best forecasters (superforecasters) in prediction markets ('supermarkets'). The idea of markets full of highly-engaged superforecasters is interesting because markets are normally thought to need smart marginal traders, known as sharks, as well as fish, who place a bet here and there but lose money, feeding the sharks.
But what happens in a market in which everyone is a shark? Good Judgment has also recently started experimenting with ways of making "fuzzier” predictions and conditional forecasts (if this happens, then will that happen?). If these forecasts are accurate, they'll be much more useful.
All that is to say that methods like these are tools among many that can aid decision-making, but shouldn’t be expected to displace long-term, uncertainty-and-complexity-embracing scenario planning and other foresight approaches. They will only be useful when information (or expertise) that can help make a prediction is ‘out there’ in the world, and distributed among many people.
That applies to most of the everyday forecasts – implicit and explicit – that people, firms and governments make every day - so it’s worth working out how to make those better.
From digital democracy to smell-o-vision, read our 10 Predictions for 2015.