This blog puts into question the vetted technique of removing sensitive data from algorithms as a way of reducing bias and discrimination.
Artificial intelligence (AI) technologies built into complex social systems such as criminal justice, loans, recruiting, insurance, etc. are becoming more and more widespread. When it comes to understanding the implications of AI in such contexts, people tend to fall into the following reaction types:
The more widespread the use of AI is, the more it becomes evident that serious problems can arise from the mismanagement of algorithmic bias.
Many predictive policing tools, for instance, are often found caught in runaway feedback loops of discrimination.
Common responses to this issue are individual adjustments to data inputs and corrections to the design that will eventually lead to the creation of “neutral” models where protected variables are omitted, or at least controlled for.
Here, I will refer to this process as “debiasing” and I will question it on the basis of the following reasons:
It is not biased algorithms, but broader societal inequalities that drive discrimination in the real world.
What algorithms bring to life are patterns that need to be interpreted. And often, it is AI specific use cases and applications that should be questioned and debated, rather than the data used to build them.
In 2016 the COMPASS algorithm, an AI tool used in a number of US courtrooms to predict recidivism risk scores (i.e. whether and when a convict will break the law again) was found predicting that African American defendants were almost twice as likely to re-offend as white defendants, according to a Pro Publica study. In addition, according to this analysis, white defendants were mislabelled as low risk more often than black defendants.
A number of responses have followed, some addressing methodological issues in Pro Publica’s study, others noting how competing notions of fairness have an impact on probabilistic models, making it impossible for a risk score to satisfy fairness criteria for black and white people at the same time.
One thing in this debate is certain: US arrests are not race-neutral. On the contrary, evidence indicates that African-American people are disproportionately targeted in policing. As a result, US arrest record statistics are heavily shaped by this inequality.
Debiasing the COMPASS tool could therefore be considered a totally acceptable trade-off solution - if only the cost of reducing algorithmic accuracy would contribute to the fairness of its application.
However, more often than not, the result of not including sensitive information is just that we get an imperfect tool to implement imperfect actions.
We’ve already seen how apparently innocuous data, such as addresses, could in fact be used in discriminatory ways. Similarly, lifestyle information, such as smoking or drinking habits are usually not considered as sensitive, but could potentially become so if used to discriminate against employees in the workplace. Even more so, if such information is gathered without individual consent.
In some countries in the world, information on sexual orientation and gender, for instance, are particularly sensitive data. This is the case even in places where progressive policies have been put in place, although freedom (and safety) of disclosure are constantly negotiated. Quite notably, during the Obama administration, the ‘don’t ask, don’t tell’ policy was repealed. As little as eight years after, transgender people are now under threat of being banned from serving in the military.
If we decide that we need to remove fields from the data - we should not only think about the nature of the bias; but also the why, when and how - and all agree on what data should be considered as sensitive.
Once we recognise the importance of context, we should not worry too much about what goes into algorithms, but what is the consequence of their use.
Instead of fixating on modifying input data or classifying their degree of sensitivity itself, we should consider algorithms as part of larger systems, having specific tasks to accomplish. COMPASS had the great value of unveiling even more the disparity in treatment African American people receive by US police. However, any use of COMPASS to measure recidivism is, in my view, problematic. This is because it is a probabilistic tool - but the consequences of its application, especially in cases of false findings, can have far-reaching effects on the lives of people charged with crimes.
Accepting the usability of AI tools to decide on the future of individuals would implicitly mean accepting the possibility of applying a quantitative, positivist approach to human behaviour. However, the extremely complex reality of constantly changing factors interacting and influencing human relations and reactions makes it incredibly difficult, if not impossible, to make accurate, nontrivial predictions.
Therefore, before asking ourselves if race, gender and so on and so forth should or should not be included in algorithms, we need to actually discuss whether these are the right tools to use in cases that directly affect outcomes for individuals.