Understanding the factors that can impact local areas, individual choices and, ultimately, obesity outcomes can help us to design and target interventions to achieve our goal of halving obesity rates by 2030.
Our food and drink decisions are informed by the places we live, work and learn. What is available, convenient and affordable, as well as how products have been marketed, promoted and packaged, depends on location.
In this project, we have explored how we can combine publicly available data and analyse it with data science methods to group locations into different categories depending on their similarities. This involves a machine learning technique called “clustering” that automatically groups locations together, in this case when they share socio-demographic characteristics and environmental factors. We chose this approach because we thought it could help uncover hidden patterns in the data that could be hard to detect manually.
First we looked for data that might be useful for clustering locations. We made decisions based on the availability, relevance, ease of access and geographic detail (eg, the level to which local area data is broken down – regions, councils, boroughs or wards). Some of the data sources we used include National Child Measurement Programme (NCMP) childhood obesity prevalence, CDRC Healthy Assets and Hazards dataset (capturing access to food environments and health-related environmental factors in local areas), ONS Median house prices paid for England and England Health Indices of Multiple Deprivation.
Having collected, cleaned and merged these data, we used data science to cluster locations in England based on their similarities and differences and used interpretable machine learning methods with the goal of identifying the factors that underpin the differences across clusters. Our geographical unit of analysis is a small official geography called “Lower Layer Super Output Area” (LSOA) with an average population of 1,500 people.
The map below shows the clusters that different LSOAs in London are assigned to. It shows that most neighbourhoods in the city are in clusters five (blue: dense urban areas with high prevalence of child obesity) and one (orange: wealthier areas with lower prevalence of child obesity). These preliminary results illustrate geographical inequalities in child obesity as well as some of their potential drivers.
Our analysis of the clusters shows that measures of deprivation, house price and pollution are the most significant factors behind their differences. Access to the retail environment, GP and other physical places were also important. The findings are not entirely unexpected, as other studies have shown the variation in the rates of obesity by local areas. However, by creating this model, we are able to use data to start understanding the differences between locations that might be leading to certain outcomes.
Although our work is at an early stage, it suggests that geography matters when it comes to obesity. Whether a place is close to the city centre or not, urban or rural is an important factor and may be linked to the time it takes to travel to different parts of the food environment, and the type and volume of retailers in an area.
"Although our work is at an early stage, it suggests that geography matters when it comes to obesity."
We want to expand our data by adding in information about food retailers and purchasing as well as adult obesity. We will use these new data and a fine-tuned methodology to segment neighbourhoods into groups with the goal of understanding the local drivers of obesity that could inform policy interventions.