There's an area where the BBC is already using an open and collaborative model for innovation: software development
On Monday, the BBC published British, Bold, Creative, a paper where it put forward a vision for its future based on openness and collaboration with its audiences and the UK’s wider creative industries.
In this blog post, we focus on an area where the BBC is already using an open and collaborative model for innovation: software development.
Although less visible to the public than its TV, radio and online content programming, the BBC’s software development activities may create value and drive innovation beyond the BBC, providing an example of how the corporation can put its “technology and digital capabilities at the service of the wider industry.”
Software is an important form of innovation investment that helps the BBC deliver new products and services, and become more efficient. One might expect that much of the software developed by the BBC would also be of value to other media and digital organisations. Such beneficial “spillovers” are encouraged by the BBC’s use of open source licensing, which enables other organisations to download its software for free, change it as they see fit, and share the results.[1]
Current debates about the future of the BBC - including the questions about its role in influencing the future technology landscape in the Government’s Charter Review Consultation - need to be informed by robust evidence about how it develops software, and the impact that this has.
In this blog post, we use data from the world’s biggest collaborative software development platform, GitHub, to study the BBC as an open software developer.[2]
GitHub gives organisations and individuals hosting space to store their projects (referred to as “repos”), and tools to coordinate development. This includes the option to “fork” (copy) other users’ software, change it and redistribute the improvements. Our key questions are:
But before tackling these questions, it is important to address a question often raised in relation to open source software:
There are several possible reasons:
The webpage introducing TAL (Television Application Layer), a BBC project on GitHub, is a case in point: “Sharing TAL should make building applications on TV easier for others, helping to drive the uptake of this nascent technology. The BBC has a history of doing this and we are always looking at new ways to reach our audience.”
We have identified 18 BBC organisations on GitHub, with 115 unique members - a level of activity that is far from insignificant. To put these numbers into context, the Government Digital Service has 41 members on GitHub, and Google 514.[3] Other UK broadcasters are also much less active in GitHub than the BBC: ITV has 11 GitHub members, Channel 4 has 7, and Sky UK has 6.
Further analysis reveals six main BBC software development areas in GitHub.[4] News, R&D and Services are where most activity is concentrated. They are followed by Platform, Archive and Mixed.[5]
We provide some examples of digital innovation in these different areas later, but before doing that, we look at how the levels of BBC activity in GitHub have changed over time.
There are currently 380 projects associated with BBC organisations – 298 of these are “original” (that is, not forks of other projects). We have also found 817 forks of BBC projects – instances where other users have copied BBC code to continue working on it.[6] The two charts below show the recent evolution in BBC projects and forks (by development area).
They show three things:
The graph below illustrates the variety of areas where the BBC is attracting interest from other users on GitHub, measured by fork numbers.
We have also looked at the location of GitHub users who have forked BBC projects (see map below). [10]
We find these users in 53 different countries. A third of them are based in the UK, and a fifth in the US.
London is the city with the largest number of people forking BBC projects (16%). Other active UK cities include Manchester, Leeds, Glasgow, and Edinburgh.
Internationally, the ‘hotspots’ of interest in BBC development include Paris, New York, San Francisco, Berlin, and Amsterdam – interestingly, more than half of the cities in this “top 10” also show up in Compass’s recent ranking of global tech start-up ecosystems. A possible interpretation of this is that some software development activities at the BBC are highly innovative, attracting the interest of entrepreneurs in reputed tech clusters.
We have shown that the BBC has an important presence on GitHub, covering an expanding number of technology areas, such as web design, data journalism, data visualisation and content standards. These activities are garnering interest from significant numbers of GitHub users, not least developers in thriving tech start-up ecosystems in the UK, Continental Europe and the US.
But what does this mean for ongoing debates about the future of the BBC?
In its Public Consultation for the Charter Review, the Department for Culture Media and Sport highlights how the UK has benefited from R&D at the BBC, while also mentioning concerns about “crowding out” technology investment in the private sector, and high costs.
In some ways, the open software development activities we analyse in this post appear to increase the public benefits from BBC’s R&D while removing some of the risks:
Given all of this, and consistent with the open vision for the future of the BBC set out in British, Bold, Creative, our question is: how can the BBC can use its considerable technological capabilities to maximise its impact on innovation, by making even greater use of the open source model we have studied in this post?
Note: Data collection and analysis for this blog was done with R. Social network graphs were produced with Gephi. The scripts and data are available in GitHub.
[1] Several key components of modern ICT systems, such as the Apache server or the Linux operating system (which, for example, underpins Google’s Android mobile OS), and a multitude of programming languages and applications (including R and Gephi, the tools used for data collection, analysis and visualisation in this blog post) have been developed using an open source model.
[2] GitHub was founded in 2009 and, as of today, it has a community of 10 million users working in over 26 million projects. On GitHub, users can share their software code in “repos” (repositories for software projects), “fork” other users’ repos (create copies of the code that they can work on) and give back their improvements through “pull requests”. They can also subscribe to interesting repos, or “star” them (the equivalent of a bookmark). GitHub’s open Application Programming Interface (API) provides easy access to data about GitHub users and their organisational affiliations, repos and their contributors and forks, among other things.
[3] There are in fact 192 members affiliated to BBC organisations including duplicates (members affiliated to more than one BBC organization). One thing to remember here is that it is always possible for an employee of an organization to participate in GitHub without being “officially” affiliated with the organization in GitHub.
[4] In practice, this meant producing a social network where BBC members were connected if they were members of the same organisation. We then used a community detection algorithm to find distinct “components” in that network. One can use different algorithms to do this, and we opted for the one that broke up the network in a cleaner (more modular) way, the “leading eigenvector” method (giving us five communities or development areas). Having done this, we allocated BBC organisations to the development area that contained most of their members. If a BBC organisation did not have a “majority” development area, we allocated it to a “Mixed” category.
[5] The Mixed area includes BBC “crossover” organisations like iPlayer, which intuitively sits between BBC Services and Platforms, and BBC Connected Studio, which, because of its crosscutting nature, includes developers from a variety of BBC communities.
[6] This list of forks excludes forks from BBC members, and forks from a single individual who is no longer active in GitHub who had forked 87 BBC projects (including 74 forks in a single day).
[7] It is worth noting that one of the development areas (“Platforms”) that we identified previously is missing from these charts because it has no original repos or forks in GitHub. One potential explanation for this is that BBC developers in that area are using GitHub on a personal basis, or that they are collaborating in projects that are not being shared openly (premium GitHub users are given the option to keep their repos private).
[8] Of course, it could also be that development activities that were previously “private” (i.e. not open) or taking place in other platforms were relocated to GitHub.
[9] Forks are an imperfect proxy for interest in BBC code for two reasons: on the one hand, developers can download code from a GitHub repo without forking it (this would lead us underestimate interest in the repo if we only look at forks). On the other hand, forking a repo is easy. There is always the possibility that the individual who forked it did not carry out any subsequent development, or used it in any meaningful way. One way to address this would be to look at levels of development activity in forks of BBC repos – this is an issue for further research.
[10] Just over two-third of GitHub users who have forked BBC projects provide information about their location (this captures 489 unique GitHub users, excluding members of BBC organisations). We have used Google Maps geo-coding API to, where possible, identify their country, and extract the geographical coordinates of their location for mapping. This allows us to, to some degree, deal with inconsistencies in the way that users provide their location. The geo-coding process has helped us to identify the countries for 423 forkers, and the localities for 316 forkers (we have excluded instances where the geo-coding process generated a large number of matches for a single location).
[11] See for example, Moody (2001).
[12] http://www.scirp.org/journal/PaperDownload.aspx?paperID=53076