AI is increasingly being used within collective intelligence research and practice. Our analysis of almost 40,000 research articles published in the last 20 years reveals the trends in this fast growing field. Research at the crossover between the fields shows little topic diversity or disciplinary breadth and this may be having a spillover effect on non-AI collective intelligence research. We also find that industry and, increasingly, China are setting the trajectory.
Understanding how AI can help us enhance collective human efforts to solve complex problems is at the heart of Nesta’s vision for a public interest AI. AI is increasingly being used within all fields and collective intelligence is no exception. Earlier this year, our report on the Future of Minds & Machines first drew attention to the need for more imaginative approaches to combining AI & collective intelligence (CI) in practice. By mapping case studies of collective intelligence in action, we found that most projects applied a fairly narrow range of AI methods to make sense of vast amounts of passively generated or actively crowdsourced user content. Almost all of these methods rely on big datasets and use machine-learning to find structure and patterns in “messy” data.
Far fewer projects used more novel AI approaches: this was true in terms of both methods and the kinds of tasks AI was being used for. For example, hardly any projects innovated with less popular AI techniques such as distributed AI, autonomous systems and evolutionary methods. And despite notable exceptions like the citizen science platform Zooniverse, there were few that applied AI in novel ways, for example, to improve the ways that information and ideas were shared or to enhance the collective output of the group during problem solving.
In order to better understand whether the barriers to imaginative combinations of AI and collective intelligence can be explained by the underlying research pipeline, we undertook a mapping and analysis of the academic literature in AI, CI and the crossover between them. We looked at three categories of publications:
CI: all non-AI collective intelligence literature,
AI: all non-CI literature on artificial intelligence,
AI+CI: the intersection between AI and CI.
We looked at how these fields have evolved since 2000 to shed light on the dynamics of the CI ecosystem and identify the trends and opportunities for the future to help researchers, practitioners, and funders make better decisions to help advance the field. We also call on the politicians and civil servants involved in setting AI industrial policy to take note of these trends.
Even though the total number of articles on the topic of AI in the last 20 years dwarfs the other 2 fields, the rate of increase in publications for CI and AI+CI has been significantly faster. Both showed a roughly similar rate of increase between 2008 and 2015 before diverging. In the last 5 years, AI+CI publications have continued to grow at a faster pace while rates of CI publications plateaued and even slightly declined.
The main contributor to CI and AI+CI crossover research has consistently been the US (accounting for almost a third of the publications in our sample, 27%), with the UK in comfortable second place (17%) but in recent years research output from China has been gaining ground. Since 2014 total number of publications from the US and UK have remained more or less consistent while China’s research output in both CI and AI+CI has tripled over the same timeframe. This increase in publications from China coincides with the country’s growing dominance in AI: China produced 17% of the AI publications in our sample, compared with 19% and 13% from the US and UK, respectively.
AI+CI publications from industry are growing at a much faster rate than academia. This stands in contrast with publications on either AI or CI alone, which have shown a similar rate of increase from researchers based in companies versus academia. Microsoft is responsible for the highest share of AI+CI publications amongst big tech companies. In general, AI+CI research tends to have a higher proportion of cross industry-academia collaboration, (between 10-15% since 2013) of all published papers than either AI or CI alone, although this has fallen to below 10% in the last 2 years.
Over the last 20 years, CI research has maintained high disciplinary breadth, with publications spanning Political Science, Engineering, Sociology, Computer Science and more. In contrast, AI+CI publications tend to fall within Computer Science, with a small proportion in Mathematics and Engineering. AI+CI research has become slightly more cross-disciplinary since 2000, perhaps reflecting the influence of CI. Even more striking, is the increased significance of Computer Science in “pure” CI research (more than 30% since 2004), with a commensurate reduction in publications from Sociology, Political Science and other fields. Readers can track these changes using the interactive chart below.
Since 2008, there has been a noticeable shift towards three topics growing in prominence in AI+CI research, namely ‘crowdsourcing’, ‘machine learning’ and, since 2015, ‘deep learning’. Even “pure” CI research has experienced a striking increase in ‘crowdsourcing’ publications in comparison to other topics. Only ‘citizen science’ has shown a similar rise in proportion of CI publications. Overall, AI+CI may be shifting away from a more integrated human-machine interaction (as evidenced by the fall in popularity of terms “social computing’ and ‘human computation’) towards the more transactional relationship to human labour demanded by supervised machine learning and deep learning algorithms. We invite readers to explore the changes in popular topics using the interactive charts below.
Our analysis revealed that the top 20 publication venues (both conferences and journals) for the fields of AI and CI are largely non-overlapping. Only one conference and one journal featured in the top 20 for all three categories (AAAI and IEEE Access, respectively) and the only other touchpoint between the AI and CI categories was the bioRxiv repository. Looking more closely at the crossover of AI and collective intelligence (AI+CI) it becomes apparent that there is a higher degree of overlap in both journals and conferences between AI and AI+CI than with CI. This is unsurprising given the focus on technical topics and narrow disciplinary range revealed by our analysis above. AI+CI publications are thus more likely to be influenced by trends in AI research rather than CI. Although all of our categories had publications in arXiv in their top 20, AI and AI+CI publications were more likely to share subject labels than CI and AI+CI (e.g. arXiv computation and language, arXiv learning as the two with highest number of publications) This may influence co-discoverability of papers on the platform and the “light” peer review process used by aRxiv which affects the likelihood of researchers becoming aware of each others work.
Our analysis reveals the need for:
Unless we incentivise more imaginative uses of AI, ones that help us make the most of distributed human intelligence, we may end up thwarting the opportunities opened up by digital technologies and smart machines. Collective human intelligence and AI are intimately connected and mutually dependent, already most AI that we encounter in our everyday lives relies on collective human labour...some of it entirely invisible. By embedding CI principles into all stages of the AI development pipeline and drawing on the insights from Sociology, Psychology, Political Science and others to inform the ways we integrate AI into our society, we are more likely to avoid was has been called ‘the wrong kind of AI’ and ensure that the technology is maximised for the public interest.
We collected data from Microsoft Academic Graph (MAG) - a scientific database with more than 236M documents. We queried MAG with fields of study related to collective intelligence and artificial intelligence and retrieved all of the publications and their metadata that were published between 2000 and 2020 and contained at least one of them. We enriched this dataset by geocoding author affiliations, identifying open access journals and non-industry institutions.
Although our database utilises MAG’s expansive coverage of academic knowledge, it comes with certain limitations. Some papers were missing important information about the journal or conference where they had been published. Moreover, funding is an important enabler of research, however, MAG does not contain this information.
Our final dataset contained 34,233 and 4887 in CI and AI+CI, respectively. We used publications in AI as a control group in our analysis. This subset of publications was substantially larger at 806,334.