Contact: Paul Hodgson, GIS and Infrastructure Manager, GLA
The City Data Analytics Programme is a virtual hub co-ordinated by the GLA’s City Intelligence Team in City Hall. It develops and supports data collaborations across public services in London.
Origins and funding
Supported by the GLA’s Intelligence Unit and using the London Datastore as a shop window, the City Data Analytics Programme is built on connections made through the Borough Data Partnership (and previous pan-London data agreements). It was formally set up by the Mayor of London in 2017 following a pilot with Nesta and several London boroughs throughout 2016-17.
The City Data Analytics Programme has now secured £365,000 of investment (including gifts in kind) from City Hall, the London Fire Brigade, the Centre for Urban Science and Progress (CUSP) in London, and Sharing Cities, a Horizon 2020 EU-funded project.
Vision and objectives
Created to support the development, commissioning and implementation of data science projects across different public sector organisations within the Greater London area, the City Data Analytics Programme is a collaborative and convening institution in which data science projects and ideas are formed, tested and executed. Determined to change the culture around data analytics in the public sector, the GLA through the City Data Analytics Programme, provides support in terms of project management, legal issues, technical aspects and data science, together with providing partnerships within the wider GLA family.
Key objectives of the City Data Analytics Programme are to test the policy or service impact of data science, show that data sharing is possible and has tangible benefits, develop data sharing protocols that will be useful in the long term, identify barriers to collaborative working and develop solutions, contributing to the development of a culture of data-sharing within London.
As part of its objectives, City Data Analytics Programme also supports the analytical capacity and technical development of borough data officers through a “City Data Academy”, ensuring that the data science talent within London’s public services teams is used to maximum effect, and that capacity and knowledge from the wider data ecosystem is applied in a way that delivers benefits to all.
Governance
Inspired by the model of the Mayor's Office of Data Analytics (MODA) in New York, the City Data Analytics Programme Project Board consists of representation from across the public sector in London alongside academia.
The board includes:
- Theo Blackwell, London CDO (Chair)
- Paul Hodgson and Vivienne Avery, GLA Intelligence Unit representatives
- Dr Simon Miles, Director of CUSP London (Centre for Urban Science and Progress) at King’s College
- Guy Ware, Director of Finance, Performance and Procurement, London Councils
- Representation from one or two of the boroughs
Because of the vast possibilities of topics to address through data science, as well as the type, size and motivations of the organisations able to contribute to individual exercises, it is anticipated that within a broad framework, each City Data Analytics Programme project will be organised around its own individual model and will focus on predictive analytics.
Team structure
The GLA team consists of 36 people in total and is based in City Hall with the necessary resources, technology and expertise to conduct data science projects. The staff delivers ODA projects alongside other commitments, for an estimated time of 3.0 FTE per week.
Roles:
- Project development and management
- Information governance officer
- Two data scientists
The City Data Analytics Programme does not intend to rely on external sources for data science and analysis, although the is a route that is available for suitable projects. The sole use of external data scientists was found to be costly and the insights are not retained by the public sector to inform future work. A data scientist has therefore been hired to work on pan-GLA as well as on the City Data Analytics Programme projects. Two data engineers will also soon be joining the team to work on platform development.
At the beginning of the project, City Data Analytics Programme had engaged external organisations such as Nesta, to help them identify a pilot (and run workshops to refine its definition) and also acquired external data science expertise. CUSP is currently one of their analytics partners.
Working practices
Information Sharing: ISG (Information Sharing Gateway)
Data Sharing: On-cloud, encrypted data
Data Storage: AWS, PostgreSQL, CKAN
Languages used: R, Python
Data visualisation tools: Tableau, D3.js
Additional technologies: GIS, NoSQL**
* GLA hosts data on AWS (Amazon Web Services), before they used Witan
** under development
✓ Penetration testing undertaken on their storage to ensure data can’t be compromised
✓ Dedicated data protection officer
✓ Information Commissioner’s Office engaged in their work
One of the important principles of City Data Analytics Programme is that there will never be a single warehouse for all of London’s data. Data will always be connected on a project basis, following the principles of openness that exist since the launch of the London Datastore in 2010. This means that the City Data Analytics Programme will adopt an open-source approach, always sharing knowledge with other cities and publishing, where possible, their algorithmic code using open APIs and common standards.
Data projects
The initial City Data Analytics Programme pilot focused on Houses of Multiple Occupation (HMOs), twelve boroughs took part and the pilot was supported by the GLA, Nesta and ASIData Science.
Identifying Houses of Multiple Occupation
Problem: It is difficult to efficiently direct the actions of building inspectors in identifying unlicensed rental of Houses of Multiple Occupation (HMOs). Firstly, only 10 to 20 percent of London’s HMOs are currently licenced, representing a missed revenue opportunity for local authorities at a time when public sector budgets are tight. Secondly, unlicensed HMOs are the likely locations of some of the capital’s worst and most exploitative housing conditions. Identifying more of them could raise money and help protect vulnerable tenants.
Solution: The development and implementation of a model algorithm has been undertaken to enable boroughs to identify and take action against HMOs in order to protect vulnerable tenants, issue more HMOs licences, and prosecute rogue landlords who fail to comply after initial warnings.
Initially focusing on two boroughs of Westminster and Lambeth, workshops were run involving building inspectors’ identification of HMO features.
Like many front-line workers, building inspectors can provide a long list of risk criteria, honed over many years of experience. In the case of HMOs, they might suggest judging risk based on features such as the height of a property, its age, location, or whether the living accommodation is above a shop or restaurant.
Outcomes and lesson learnt: As expected, identifying HMOs is not a simple problem and each borough has different methods to record data. Data availability varies across London, as well as the interpretation placed on top of a base level of licensing, consequently affecting the types of HMOs that are licensed in each borough. A detailed report has been published by GLA and Nesta, highlighting the high variation of data availability and quality across boroughs and the difficulty in running randomized control trials to cross-check the effectiveness of the algorithm in different environments.
Another complicating factor is that in the first iteration of the pilot the HMO problem was only “half-labelled”, meaning that the data showed properties that definitely were HMOs, but not those which were “definitely not HMOs”. Based on this analysis, an adapted balanced random forest method was then introduced to detect anomalies in the data. The report also lists the specific lessons learnt and the recommendations that surfaced through the pilot.
Another ongoing project (in partnership with the Alan Turing Institute), is a long-term data approach to better model air pollution in the capital. This will be achieved by collating existing and new data sources (such as medium and low-cost sensors) and enhancing how they are analysed through probabilistic modelling. This pilot will complement the modelling already undertaken in London, which adopts a mechanistic approach (in the case of the London Atmospheric Emissions Inventory) and relies on more traditional data sources.
City Data Analytics Programme’s initiatives will add to the existing ecosystem of ways for London’s boroughs and public services to innovate in data sharing and standards.
Work plans for the future
City Data Analytics Programme is currently working on the two projects described above and intends to move the HMO project from a pilot phase to being operational.
For the selection of future pilots, the GLA is also working on a list of possible problem statements, these will soon be published on London Datastore. This will include an indication of project ideas, challenges and operational priorities from a wide cross-section of departments and public organisations and it will be used to select future pilots. A rationale already exists that helps visualising data project typologies in which City Data Analytics Programme is likely to get involved.
The list of projects will be prioritised and regularly refreshed and published on London DataStore starting from Autumn 2018, to help external organisations, such as universities, to identify opportunities for collaboration.