In the first part of our crowd predictions results series, we summarise who took part, what happened and what it means.
We launched our crowd predictions challenge in December 2018 at a time when Brexit fever was at an all time high and the first Article 50 deadline loomed. Our aim was to test whether crowd wisdom could be maintained in the face of radical uncertainty caused by Brexit. We also wanted to see what effects demographic factors such as gender, age and location might have on prediction accuracy and forecasting behaviour.
The method at the heart of the challenge was prediction polling. It allows individuals to continue adapting their predictions as they discover new information or the context around the issue in question unexpectedly changes. It was first developed by the researchers behind Good Judgment Project (GJP) in 2014. These initial experiments showed that amateur crowds could be used to generate surprisingly accurate predictions about the likelihood of geopolitical events.
The appetite for predicting the future is well and truly alive, even beyond seasoned forecasters. The individuals that made up our crowd came from a variety of backgrounds, and almost half of them registered their demographic information. This gives us some insight into the types of people who were taking part. So what do we know about the crowd?
Throughout the year we asked 11 questions related to Brexit. These ranged from questions about Article 50, election outcomes, UK house prices and the value of the Pound against the Euro. The overall success rate of our crowd forecasts for these questions was 73%. The rest of the challenge questions focused on science, technology and health and they predicted correctly on 63% of these topics. Not bad for an amateur crowd!
When it came to getting it right on questions around Brexit, our forecasters accurately predicted that Article 50 would be extended in March 2019 and then again in November 2019. The crowd also predicted the vote share that the Brexit Party and Change UK would gain in the European parliamentary elections, that Theresa May would still be prime minister on 1 July (just) and that the Conservative Party would win a majority in the UK’s general election in December 2019. While the answers to all of these questions may seem obvious in retrospect, this is most likely due to hindsight bias. At the time even experts were reluctant to make predictions.
Unlike the majority of Brexit questions, two of the ones our forecasters got wrong were more reliant on market dynamics, which might be more difficult for amateurs to forecast than questions that have a larger influence from public opinion. Even though our participants correctly predicted the conversion rate between the Pound and Euro when we posed the same question earlier in 2019, the late summer and early autumn marked a particularly turbulent period in British politics. We discuss the possible reasons for these incorrect forecasts and other questions the crowd got wrong in the second part of this series.
The demographic information provided by participants allowed us to analyse whether certain characteristics differentially affected the performance of our crowd. Only the registered forecasters were included in these analyses [2], which revealed that:
25-35 yr olds were significantly more accurate** than other age groups across all of the questions. A possible reason for this difference could be the higher and more varied media consumption of this age group, which could give them access to more diverse sources of information to inform their forecasts. We saw no conclusive evidence for a difference in accuracy or forecasting behaviour between male and female forecasters. In fact, both groups showed similar tendencies when it came to participation in terms of topic preferences and commenting behaviours.
US participants were less accurate** than UK participants or other locations. Given the high number of questions with a Brexit focus, this might suggest that forecasters who are embedded in the context have an advantage over those who are more removed from the on-the-ground dynamics of the issue. The UK participants also contributed to more questions on average than those from the US, which might indicate that they were more engaged with the challenge topics overall.
One of the most unexpected results from the challenge is that experienced forecasters were less accurate* than those who were completely new to forecasting. This might be due to overconfidence associated with the Dunning-Kruger effect, which refers to the tendency of some individuals to be unaware of their own ignorance on particular topics. This particular bias is known to interfere with the forecasting ability of experts. More research would be needed to look into this and other potential causes of the result, but it does suggest that in periods of radical uncertainty, amateurs may be more aware of the limits of their own judgement and calibrate their estimates accordingly.
The final analyses we ran looked at how well different groups performed on the challenge questions related to Brexit in comparison with their performance on questions about other topics. Overall, we found that 3 groups were significantly more accurate at predicting Brexit than on other topics: 18-24 year olds, those without prior forecasting experience and men. In fact, 18-24 year olds were significantly more accurate than all other age groups when it came to forecasting Brexit.
These results seem to suggest that when it came to Brexit (and perhaps any equivalent period without historical precedent), individuals without experience or existing expectations about how things should turn out may be better able to assess the available evidence and estimate the likelihood of specific outcomes. Of course this doesn’t explain why the same pattern was seen in men so further experiments would be needed to get definitive insight into the source of these differences.
For two of the non-Brexit questions that the crowd forecast incorrectly, the absence of a substantial number of local forecasters may have played a part. In these two questions we asked about the likelihood of Ebola transmission beyond the Democratic Republic of Congo and Uganda, and the number of CRISPR babies that would be born in 2019. The events related to these questions played out in East Africa and China, respectively, where we had very few forecasters. This stands in contrast with two other location specific non-Brexit questions that were both related to US events. Specifically, we asked the crowd to forecast the total number of measles cases and Type 4 category hurricanes. US forecasters made up a quarter of our total registered predictors and in both cases, the crowd assigned the highest likelihood to the correct answer. Further experiments would be needed to look at the relative accuracy of crowds with different proportions of local forecasters to determine whether this effect can be confirmed.
As the world battles the COVID-19 pandemic, the value of accurate predictions has come into sharp relief. Since the pandemic has taken hold worldwide, forecasting competitions to develop statistical models of the spread of the virus or simulate the impact of various interventions have been organised by governments and international agencies. One of these competitions is being spearheaded by a team of researchers at Carnegie Mellon University (CMU) in partnership with the US Centre for Disease Control. This team has an impressive track record for producing the most accurate seasonal flu predictions of the past five years. The secret to their success? A method that relies on combining nowcasting (accurately predicting the current number of infections) and forecasting (predicting the future number of infections) using both AI models and crowd forecasts. Their ongoing effort to model the spread of COVID-19 in the U.S. also makes use of their dedicated wisdom of crowds platform Crowdcast.
CMU’s approach demonstrates the value of drawing on all of the tools available to us in the face of complex problems. This is collective intelligence at its best, where the complementary strengths of people and technology are combined in a way that makes sense for the problem at hand. Neither is enough in isolation.
At the start of the year, the government advisor Dominic Cummings called for “the science of prediction” and individuals with diverse skill sets to be brought into government decision-making. The results of our year long experiment with crowd predictions help to make this case. They suggest that different demographic groups diverge in their overall forecasting accuracy and behaviours. Lack of forecasting experience also played a surprising role. So isn’t it time that governments, companies and institutions worldwide looked beyond the usual suspects and asked the crowd?
[1] Up to 40% on some questions.
[2] Differences in accuracy reported at the following levels of significance: ***= 0.001 **= 0.01 *= 0.05
Crowd Predictions: The forecasts in full