As more of our economic, social and civic interactions come to be carried out by algorithms – from credit markets and health insurance applications to recruitment and criminal justice systems – so too have concerns increased about the lack of transparency behind the technology.
AI-based systems are often opaque, hard-to-scrutinise ‘black boxes’, which leaves individuals with little understanding of how decisions are made about them.
My quest of increasing algorithmic accountability led me towards explanations: I hoped to find a legally binding right which guarantees that important algorithmic decisions that affect people’s lives have to be explained. Unfortunately my research has shown that, from a legal perspective, we have a long way to go.
Working with fellow academics Brent Mittelstadt, a data ethicist, and Chris Russell, a machine learning expert, I have been tackling the question of what good explanations might look like and whether they are technically feasible. Our recent paper on the concept of ‘counterfactual explanations’ – explaining why a negative decision has been made and how circumstances would have had to differ for a desirable outcome – has been cited, and the ideas behind it implemented, by Google.
However, our work is far from done. Although this is a major step forward, explanations of decisions are just one side of the coin in achieving true algorithmic accountability: explanation does not equal justification or legitimacy.
We know that big data and algorithms are increasingly used to assess us, predict our behaviours and preferences, and ultimately make important decisions about us. Algorithms can infer our sexual orientation, political stance and health status without us being aware. They also decide what products or newsfeeds are shown to us, as well as whether we get hired, fired or promoted, if we get a loan or insurance, among myriad other things.
Algorithms do this by drawing inferences from highly diverse and feature-rich data (such as our web browsing history or social network interactions). These inferences are often invasive, counterintuitive and non-verifiable, and we are unable to predict, understand or refute them. Yet these inferences shape our identities and reputations and steer our paths in life. Data-driven decisions thus create new opportunities for discriminatory and biased decision-making.
But do data protection laws offer meaningful control over how we are being evaluated by algorithms? Even though their purpose is to protect our private lives, EU law and jurisprudence are currently failing to protect us from the novel risks of inferential analytics. Ironically, these inferences tend to relate to low-priority personal data, which receive the least protection in law but which pose perhaps the greatest risks in terms of privacy and discrimination. Inferences are effectively ‘economy class’ personal data in Europe.
As we show in our latest paper, in standing jurisprudence the European Court of Justice (ECJ) has consistently restricted data protection law to assessment of inward personal data such as name, age or email address, rather than outward data such as inferences, opinions or assessments such as credit scores. Critically, the ECJ has likewise made clear that data protection law is not intended to ensure the accuracy of decisions and decision-making processes involving personal data, or to make these processes fully transparent.
If a person feels unfairly treated, recourse must be sought using formal procedures that are applicable to their individual case. However, very often, especially in the private sector, the way decisions are made remain within the private autonomy of the decision-maker, with limited anti-discrimination regulation to govern how decisions are made and what criteria are relevant, justified and socially acceptable.
At the root of this problem is that data protection laws focus too much on the moment when data is collected but hardly at all on what happens after it has been obtained. For example, sensitive personal data on race or sexual orientation enjoys higher protection than other types, while anonymised data does not fall under data protection law at all. This stems from the idea that we can foresee potential risks and consequences when data is collected.
But this idea loses its value in the age of big data analytics. All data can be re-identified, and non-sensitive information can turn into sensitive information: postcodes, for example, can be used to infer race or sexual orientation. It is therefore time to focus on the output data and the potential impact of data analysis.
We need more focus on how, why and for what purpose data is processed, and to work on standards for inferential analytics – such as inferring political views or (mental) health status based on browsing behaviour – that are robust and socially acceptable.
We have made several recommendations on how to close these accountability gaps and guard against the novel risks of big data and AI. These include:
• Recognition that the right to privacy is more than just ‘data protection’ – it is about identity, reputation, autonomy and informational self-determination.
• Dealing with (new) intellectual property and trade secret laws that could hinder AI transparency by providing extensive protection of commercial interests attached to the technical processes involved.
• A focus on how data is evaluated, not just collected, with a standard for the ‘right to reasonable inferences’.
• Statistically reliable data and methods for ‘high-risk’ inferences – that is, inferences that are privacy-invasive or potentially reputationally damaging, with low verifiability due to being predictive or opinion-based.
In the same way as it was necessary to create a ‘right to be forgotten’ online in a big data world, we believe it is now necessary to create a ‘right of how to be seen’. This will help us seize the full potential of AI and big data while protecting individuals and their fundamental rights.
Dr Sandra Wachter, Lawyer and Research Fellow, Oxford Internet Institute