At Nesta we have become a leading figure in applying data science methods for building evidence in innovation policy, health policy, creative economy and the arts. But as our capacity grows, do we need to start questioning how far we can go with these methods? The ethics of data science is a grey area, but that doesn't mean that we should shy away from engaging with it.
I was recently in a meeting with a client who was concerned that using data science methods might tarnish their organisation’s image, or even fall foul of the EU data protection laws. Such concerns are increasingly common in the face of the Cambridge Analytica scandal. Beyond talks of foul play in the Brexit referendum and US election, other stories of algorithmic mismanagement have unfolded, such as the ‘flash crash’ of the pound in 2016, which have sown public mistrust in data science methods and the use of public data.
Modern data science has emerged from the triple-helix of the World Wide Web, Computational Methods, and Artificial Intelligence. The World Wide Web, as an infrastructure, has enabled the rapid sharing and merging of large volumes of ‘Big’ data facilitated by computational methods. Artificial Intelligence, in turn, allows information to be disentangled from data; with applications including the identification of spam emails, Google translate, and finding important people in networks.
These seemingly uninterpretable methods can be combined and applied (unscrupulously or naively) to the misuse of data on a colossal scale. Even though it would never be our intention to misuse data against the public interest, Nesta needs to build a strategy for “doing data science” without violating our principles, namely:
We do data science so that we can achieve Nesta’s core goals by making sense of the World Wide Web, by generating new insights from old resources, and ultimately by broadening the evidence base for policymakers and society. In our context, concerns arise about how personal (or sensitive) data might be used. The kinds of sensitive information we might encounter include academic histories, ethnicities, age and social networks. One set of ethical issues can arise when you consider that true biases from this information in the data can produce biased AI. Separately, it is important to question whether an individual would object to their personal data being used in unforeseen ways.
It should also be said that much of our work doesn’t involve any personal data, such as our work on job classification with the ONS. However in our other work, we have used large numbers of academic papers, including authorships; patent data, including inventors; meetup data, including member IDs; and Twitter data, including Twitter handles. But should the ethical alarm bells start ringing already? The analyses of these data include Arloesiadur, Creative Nation and The Westminster Twitterverse - and nobody would claim that these were ethically contentious.
It seems, therefore, that if we’re transparent about what data we have, ethical dilemmas seem to arise when you consider what we might do with the data - perhaps more than the data we use.
From the monoliths to the minnows, organisations have been churning out ethical frameworks left, right and centre. Take Google’s AI charter, which sets out very broad principles that will guide its future selection of projects. But this doesn’t come close to explaining what data, methods and AI they will use - presumably so that they can accommodate a broad business portfolio. On the other end of the spectrum, DataKind has developed a very specific set of questions which its volunteer data scientists should ask themselves before making decisions. This is great but (and I caveat this “but” with a nod to the resource constraints of charities) it is not obvious to me that the data scientist in question is necessarily best placed to give their own work the green light. As opposed to Google’s proposition, this kind of charter perhaps shifts too much corporate responsibility onto its data scientists.
Constructive criticism aside, I acknowledge that coming into the race a little later means that we benefit from hindsight. On the back of work of others, I therefore propose the following: we will draw up a very concise, plain language, data science charter which will both guide our work and also provide clarity to our stakeholders, including members of the public.
The latter point particularly resonates with my experience of a public dialogue, commissioned by Nesta’s inclusive innovation team, which I attended last week. My personal takeaways from this dialogue are as follows:
It was particularly interesting to note that, whilst data seems to be a primary among experts, at first glance the public appears to be more concerned by methods. There may be good reasons for this, possibly because it is more immediately obvious how algorithms can directly impact our day-to-day lives, or perhaps simply because experts are more actively engaged in the wider discussion. I think that this needn’t be a sticking point, since these are two sides of the same coin: if you can’t produce an output for ethical reasons, then the data you can use will also be restricted.
Nesta has been discussing the ethics of AI for some time. In a recent non-technical workshop, it became apparent that being clear on our ethical boundaries is increasingly important as Nesta leads the way in using data science to address sensitive aspects of human behaviour.
There was a strong feeling that Nesta should retain some corporate responsibility for data science, even though the total number of data scientists at Nesta is relatively small. One approach which we are considering is the creation of a small non-technical data science ethics panel. The panel would evaluate risks and opportunities for Nesta, in terms of impact and reputation. Once a data source or analytical technique has been approved, it would be added to a public-facing list, which would offer public accountability.
A key consideration will be in applying ‘everyday ethics’, where possible, to data science projects. Let’s take an everyday example: it would be ethically dubious to stand in a restaurant guessing the ethnicities of customers. The data science equivalent might be predicting the ethnicity of customers using their names from Twitter. I acknowledge that the debate is more nuanced than this, but this approach at least applies a relatable ethical baseline.
As we move towards a data science charter, it is clear to me that Nesta should offer a charter that balances corporate and individual responsibility, whilst remaining publicly accountable. In short: if our work affects how democratically elected people make decisions, then we must accept responsibility for our algorithms and interpretations. Furthermore, we must acknowledge genuine concerns that our increasingly algorithmic society is becoming opaque. On this, Nesta must be a part of public confidence building.
In the next two months we will draft a charter by drawing upon the work of others, our public dialogue, our internal discussions and also from your comments. In August we will present the draft charter to our peers in other organisations (if that’s you then please get in touch!). After this, we aim to have the charter ratified by our board of trustees.
Culture and ethics are always evolving, and so the debate on data science ethics will never be settled. The red lines in this grey area are contextual and personal and, with that in mind, we would like to know what you think! What would you like to see from a data science charter from us, Google or anyone else?
10 principles for public sector use of algorithmic decision making
Me, my data and I: The future of the personal data economy
Algorithmic Transparency for the Smart City
Ten simple rules for responsible big data research