Last year, the UK research lab DeepMind announced that its AI system, AlphaFold 2, can predict a protein’s 3D structure with an unprecedented level of accuracy. This breakthrough could enable rapid advances in drug discovery and environmental applications.
Like almost all AI systems today, AlphaFold 2 is based on ML techniques that learn from data to make predictions. These ‘prediction machines’ are at the heart of internet products and services we use every day, from search engines and social networks to personal assistants and online stores. In years to come, ML is expected to transform other sectors including transportation (through self-driving vehicles), biomedical research (through precision medicine) and manufacturing (through robotics).
But what about fields such as healthy living, early years development or sustainability, where our societies face some of their greatest challenges? Predictive ML techniques could also play an important role there – by helping identify pupils at risk of falling behind, or by personalising interventions to encourage healthier behaviours. However, its potential in these areas is still far from being realised.
'Why are we able to develop machine learning techniques to understand the structure of proteins, yet not to help social workers identify and support children at risk?'
For example, a recent paper by the What Works Centre for Children’s Social Care testing ML accuracy in children’s services showed these models performed poorly when trying to identify children at risk. And in public service contexts where ML has been applied more widely, such as the criminal justice system and policing, its use has raised important concerns around fairness, justice and safety for minorities and vulnerable groups.
This situation reminds me of the puzzle posed by the US economist Richard Nelson in his 1977 book, The Moon and the Ghetto. He asked why the US was able to send an astronaut to the moon yet unable to tackle deprivation and exclusion in its own cities. We could pose a similar question about machine learning: why are we able to develop ML techniques to understand the structure of proteins and transform how we access and interact with information online, yet not to help social workers identify and support children at risk?
This is an important question for Nesta’s Data Analytics practice. In the last five years, we have applied data science and ML in fields including the arts, research and innovation policy, and skills and jobs. As we begin working on our new strategy, we know that ML has an important role to play in advancing Nesta’s ambitious missions – but we still have much to learn about what has worked (and not worked) in other sectors. In the rest of this blog, I draw some lessons from sectors where ML has already been deployed with transformative – but not always positive – impacts and consider their implications for Nesta’s Data Analytics practice.
ML techniques extract patterns from data which they then use to make predictions about new situations. This can be done in several ways: supervised methods learn from labelled data, unsupervised methods learn from unlabelled data (eg. identifying clusters of similar observations in a dataset), and reinforcement methods learn how to achieve a goal through trial and error in simulated environments. Technology companies have a clear rationale for investing in these techniques: they help serve their users more relevant search results, adverts and recommendations that are more likely to elicit a response and hence get them to spend more time on their websites.
In theory, one can think of many opportunities to deploy similar ML techniques to tackle societal challenges – for example, to identify families exposed to food insecurity, or to tailor digital education solutions to the needs and preferences of different learners. But realising these opportunities requires careful analysis of organisations and communities in order to identify high-value use-cases for ML predictions
Unfortunately, organisations working in the frontline of societal challenges often lack the time, resources and expertise to both explore how ML could help them achieve their mission and engage with ML researchers and developers from other sectors. This can lead to missed opportunities and slow the deployment of ML for social good. It can also create the risk that ML interventions adopt a ‘solutionistic’ mindset to tackling complex societal challenges, thereby failing to engage sufficiently with communities of beneficiaries and ignoring the fact that, on many occasions, ML will not be the right method to tackle such a problem.
ML techniques have to be trained on a large amount of data. This explains why so many were first developed and deployed within the internet, which is awash with data that its users contribute either voluntarily (eg. when they submit an article to Wikipedia or publish a post in social media) or involuntarily (when they carry out a search in Google or purchase a product on Amazon). All this data is complemented with public datasets curated by academic institutions and the public sector. An important example is ImageNet, a dataset created by Stanford University’s Professor Fei-Fei Li which has been used to train pioneering computer vision models. In a similar vein, AlphaFold 2 was trained on a public dataset of 170,000 proteins whose 3D structures had already been determined using experimental methods.
Detailed and timely data is harder to come by in sectors such as health and education – partly for good reasons such as concerns about data protection and privacy that have been often neglected by technology companies, but also because of poor data collection and quality control, fragmentation in standards, and an unwillingness to share data for commercial or organisational reasons (including fear of revealing mistakes and bad decisions). And while statistical offices such as ONS in the UK are increasingly opening up their official data, this is generally only at high levels of aggregation that are less suitable for the personalisation and targeting applications at which ML excels.
We are starting to see promising initiatives to address this situation, with platforms for secure data sharing such as XPRIZE’s Data Collaboratives and Lacuna Fund, which supports the creation of labelled datasets in low- and middle-income countries. However, as social ML innovators start accessing and creating training datasets for their models, they need to be mindful of biases which could lead to discriminatory outcomes against minorities that are underrepresented or unfairly represented. In the same way online text has been found to encode sexist and racist stereotypes, health and education data is very likely to reflect histories of prejudice and inequality that could bias any ML models that are trained on it.
Data is of little value without ML algorithms to process and extract valuable insights from it. The success of ML in the internet economy is largely associated with the advent of deep learning: a powerful technique that loosely imitates the way our brain’s neural networks operate, in order to identify complex patterns in unstructured datasets such as images, videos and text.
Two important digital trends pioneered by internet firms have made these algorithms widely available: first, the release of open-source implementations in popular tools such as PyTorch and TensorFlow; and second, the development of cloud computing infrastructures providing access to the storage and processing power required to train these algorithms without massive investments in fixed capital.
Data scientists and ML researchers looking to tackle societal challenges in other fields can tap into these open-source efforts. But as they do, they should reflect on whether complex and opaque deep learning algorithms that excel at pattern matching are suitable for informing decisions in domains such as health and education, which involve sensitive data and where being able to understand the reason for an outcome is paramount.
Other ML techniques based on statistical methods (such as linear models or decision trees) and causal inference may be more suitable, because they respectively rely on fewer predictors and simpler algorithms (making them easier to interpret), and require data scientists to specify a causal model linking predictors to outcomes (reducing the risk of capturing spurious correlations in the data). Another advantage of these methods is that they tend to require less data for training, and are less likely to ‘overfit’ their training set (learning every quirk of this data but losing robustness when applied to new situations).
ML researchers follow the ‘Common Task Framework’ to evaluate the techniques they develop. In this approach, different models are evaluated against the same dataset using metrics that capture the quality of a model’s predictions. This makes it possible to compare the performance of different techniques objectively, and creates incentives for researchers to improve and combine methods to beat the state-of-the-art. Along similar lines, the structural biology community has designed metrics to evaluate model performance and grand challenges where research teams compete to predict the structure of new proteins (AlphaFold 2 recently outperformed all rivals in one of these challenges).
ML-oriented prediction competitions are fewer and further between. One interesting example is the Fragile Families Challenge, organised by researchers at Princeton University to test the effectiveness of different ML models at predicting life outcomes for children from deprived backgrounds using survey data collected over 15 years. The Fragile Families team provided secure access to this data for hundreds of researchers who tried to predict key outcomes, such as a child’s grade point average at age 15 and whether a family would be evicted from its home, using a variety of ML models both simple and complex. In general these models performed poorly, highlighting the difficulty of predicting long-term individual outcomes, and the importance of publishing negative results so these difficulties are recognised by researchers and policymakers.
As they develop and adopt new metrics of progress, social innovators should be aware of the side-effects of becoming obsessed with metrics. In the case of ML, the field’s single-minded focus on pushing the state-of-the-art has led to the neglect of techniques that are less reliant on big data and more explainable, environmentally sustainable and robust. On the commercial side, social media platforms’ quest to optimise narrow metrics of value, such as user engagement, has resulted in increasing political polarisation and enabled the spread of misinformation. Charles Goodhart’s observation that any indicator used to measure performance becomes a target for manipulation and gaming applies to algorithmic systems as much as it does to human organisations.
ML has achieved its greatest impacts in sectors where it is easier and cheaper to move ideas from lab to application, and to organise experiments yielding robust data about the impact of different products and services. Those ideas with the greatest potential can then be scaled up into production (relatively) cheaply using on-demand cloud computing resources. The resulting feedback loop has increased the incentives to invest in ML by boosting the expected reward from those investments – it is easy to identify and scale the ideas that are profitable, and to stop/adapt and learn from those that prove unsuccessful.
While rigorous experimentation is at the heart of mainstream efforts to tackle societal challenges around health, education and international development, there are fewer examples of projects combining experimental methods with ML, or of testing ML recommendations for social good in online settings. One interesting case of the former is the International Rescue Committee’s experiment to test the impact of different job assistance programmes on outcomes for Syrian refugees in Jordan; this uses ML to assign participants to different interventions, taking into account what has been learned about which interventions are more suitable for different groups. We have also seen innovative uses of simulation to compare ML predictions against a human baseline in bail decisions, a context where it would be unethical to run controlled experiments.
Thinking more widely, we believe there are many opportunities to use ML to personalise and adapt interventions that are currently delivered using a ‘one size fits all’ approach – the standard model for public service delivery. A key challenge, however, is to do this in a way that is beneficent and respectful of individual autonomy.
Over the coming months and years, we will draw on Nesta’s unique combination of subject expertise around our missions and technical expertise in innovation methods to effectively deploy ML to tackle social challenges, while mitigating its risks.
We will work closely with our mission teams, workers in the frontline, people with lived experience, and other Nesta practices such as Design and Collective Intelligence Design to identify beneficial and inclusive ML use-cases to advance our missions. We will actively look for method mixes that enrich and strengthen the evidence generated by ML techniques (including through experiments to evaluate their impacts), while ensuring our metrics of progress are holistic and inclusive, and do not encourage manipulation and cheating.
We will assess the state of the data ecosystems of the mission fields in which we operate, looking for strategic opportunities to create novel, high-quality datasets and data collection infrastructures that lower the barriers to adopting ML in a secure and ethical way. As part of this data generation and curation effort, we will carefully document mission-related data using approaches such as ‘Datasheets for Datasets’, where the provenance and composition of a dataset is carefully documented in a standard form so everyone is aware of its potential uses and limitations. We will also release the code, tools and models we create so they can be reviewed and built on by others, contributing to the technical infrastructure that enables ML projects in the mission areas where we work.
All our ML work will be informed by strong data justice and ethical principles, and draw on a growing number of participatory ML approaches that are being developed with the goal of involving relevant communities to ensure the ML interventions address these communities’ needs, identify and mitigate risks, and increase legitimacy. We will seek to build ML capabilities among the stakeholders and communities we work with – empowering them to understand these methods and apply them to address their needs, but to challenge their use where negative impacts and unacceptable risks are created.
Nesta has much to learn from other sectors as we start deploying ML methods to tackle our ambitious missions around a better beginning, a healthy life and a sustainable future. We believe that by combining multiple disciplines and by being open with our data and code, and ethical and justice-oriented in our project-selection and development processes, we can create a recurring cycle of responsible innovation that augments machine learning’s contribution to the common good. In future, we expect ML to help social workers, teachers and nurses in their jobs, just as it is already contributing to our understanding of the building blocks of life.