Bias in AI court decision making – spot it before you fight it


Machine learning in court decisions

Use of machine learning in different decision making processes, including in judicial practice, is becoming more and more frequent. As the court decision have a great impact on the individual’s personal and professional life as well as on the society as a whole, it is important to be able to identify and ideally rectify the bias in the AI system to avoid that the model renders an unfair or inaccurate decision. The benefits of any imperfect solution, even if potentially better than the traditional court ruling, need to outweigh the flaws and risks involved.


The goal behind using machine learning in court decision-making is to make the decisions and the decision-making process better, that is more accurate, just, as well as faster and less costly. Perhaps surprisingly, AI model can actually help judges uncover and fight their own bias. The system can alert the judge when, based on historical statistical data, it detects in the judge’s language that he or she is less attentive and is about to make a snap decision or is being less empathic. The model does it by among others weighting in external factors, which could impact the decision-making, such as the time of the day, temperature or even the fact that the elections are coming. For example, it has been shown that the judges suffer from a so-called glucose depletion which severly impacts their decision-making capacity. During a study of parole judges in Israel, as described by Daniel Kahneman in his Thinking, Fast and Slow, the judges were more prone to grant a parole after the meal break, with the approval spikes reaching 65 % after each meal, whereas on average only 35 % parole requests got approved.  (Daniel Kahneman, Thinking, Fast and Slow, 2011). 

Status quo

Obviously, at the current stage of technology, only certain court decisions or certain parts of the court decision-making can be made by algorithms. Some level of AI has been used in the so-called predictive policing, where based on the available data, the algorim helps the police or the court to decide on a particular aspect of the case. This can be e.g. granting of parole, deciding on the bail and determining the appropriate sentence. The courts could use such a software for example to assess the risk that the defendant would commit another crime while on a parole, whether he would appear for the court date if bail is granted or whether a probation shall be considered.  Additionally, machine learning has also been used in actual rendering of the verdict.  The cases involve usually small civil law disputes, incl. overturning or deciding on parking fines. Estonia has recently unveiled its pilot robo-judge that would adjudicate disputes involving small monetary claims.

Bias in AI court decisions

Despite the advantages being numerous and potentially endless, a risk-averse approach to the deployment of AI in court decision-making is crucial. Prior to launching any algorithm to replace a human judge in rendering the verdict, we have to be sure it will render a decision at least as just and justifiable as the human judge would. The European Commission’s High-Level Expert Group on AI considers an AI system to be trustworthy when it is lawful, ethical and robust.

The biggest issue of consideration when talking about an ethical AI is a presence of a bias, be it in the algorithm itself or in the data, which can impact and distort the calculation and prediction process.

  • The bias can be the result of mistakes in the sampling and measurement, causing the data to be incomplete, based on too few data or simply wrong. This would be a situation when due to a negligence of the data collection or production process ends up using bad data. Such a bias can in theory be corrected for by re-acting the data collection and production system and including the missing data or replacing the bad one. However, if the data is missing because it simply does not exist at first place, it would be a more difficult task to correct the bias. That can happen when for example certain types of crimes are in practice not investigated and thus a group of criminals not persecuted because of biased police practices.
  • The data can also carry in itself a prejudice reflecting inequalities in the society. The most present underlying bias concerns racial and gender inequalities, as well as those related to social background of the person and his or her sexual orientation. This can be reflected in the content itself as well as the language of the data. For example, the way the case facts and circumstances or the defendant’s actions are described can carry an information bias as to defendant’s race or social class. A algorithm predicting a risk associated with a certain defendant’s behaviour built upon historic data from a district with a higher level of racial intolerance could reflect the law enforcement’s disproportional targeting of Afro-Americans, resulting in overrepresentation of such a data in the final poll of collected data.
  • Additionally, the bias can be caused by data that is dirty (as opposed to clean good and quality data) also when it reflects or is influenced by fraudulent information, falsified documents, planted evidence or other manipulated or unlawful facts. Such a bias, if uncovered, moves the use of the whole AI system into a potentially illegal sphere and increases the accountability risk for the developer and the user. 

Fighting the bias

A model built upon and using bad or dirty data runs the risk of further propagating discrimination and inequalities in the society by increasing the disconnection between the output of its work (the decision) and the social values (equal treatment, just process etc.), ultimately rendering an unjust or inaccurate decision. Discovering and eliminating the unfair bias (as opposed to one which was introduced on purpose) before, or in a worse scenario after its deployment in court decision making is vital.

As the AI systems used in the public sphere, including by courts, are developed by predominantly private companies, unless explicitly programmed that way, they do not carry an inherent commitment towards protection of the justice or human rights. In other words, the system is not ethical unless made to be so.

AI model due diligence

Ideally, AI systems, due to their great potential impact on our lives and our basic human rights, would be regulated and overseen in a similar way as e.g. aerial traffic, health system or law practice. However, such a regulatory regime for now seems more of an utopia than something achievable in a short-term. In the meantime, to make sure the an AI system deployed now would render as fair and accurate of a decision as possible, the software has to be subject to an ongoing audit process.

No matter how accurate it seemed when first deployed, the court using it would have to make sure, on an on-going basis, it is performing consistently and rendering fair decisions that would compare quality-wise to those of the human judges. Although introducing a corrective measure to eliminate an identified bias could be in theory feasible, the practice has shown making the AI system ethical by allowing for some values to be artificially introduced is very difficult.

Individual case due diligence

Following an audit of the AI model itself, the second layer of the overall AI court decision-making model is ensuring it is verifiable from the perspective of the concerned parties. We should not forget that the AI system does not feel or care and thus should it render an unjust decision, it would not be conscious of it. Also, it does not provide reasons as to how it got to one or another verdict. Inherently it thus lacks the core value that would make it trustworthy in the eyes of the case parties – explainability. 

AI explainability or explicability refers to a due diligence process that enables the concerned parties to ask for an explanation behind a machine learning decision that has a legal or other significant impact on them, and potentially contest it. Albeit being a part of a general auditing obligation, this measure also entails the parties’ right to access, to an extent possible and reasonable, the data used and information generated by the AI model. 

In practice, it may not be always easy or even possible to unveil the reasoning bend a decision made by an AI model. Often, the prediction models of the highest level of explainability, such as the decision trees and classification rules, lack prediction accuracy, and vice versa – neural networks are typically very accurate, yet very opaque as to how they make the calculations.  Nevertheless, even when using deep learning, the overall auditing of the AI model and ensuring as high a traceability of the decision making process as possible would provide some level of transparency and thus explicability. 

Trainings in AI

As the AI systems used for court decision making are most probably (and hopefully) not developed or implemented by the courts themselves, but by an outsourced developer or vendor, the decision-makers, as well as the concerned parties lack a knowledge and understanding of how the system works and based on what criteria it makes decisions.

Also, the AI system may not be used to completely replace the human judge and lawyers, but merely to complement them.

In either of the scenarios, it would be beneficial if the judges as well as the lawyers had a good – or at least some – understanding of the AI model used, the input data and the predictive methods. It is neither feasible nor necessary to now train all lawyers and judges in how neural networks work. However, as a first step, we could focus on training to uncover the bias and dirty data in a regular decision-making so that the legal professionals become more conscious about the use of discriminatory language or fraudulent data. A higher bias awareness, combined with a basic understanding of the machine learning processes, its benefits and limitations, could help us the parties involved make some sense of the information provided via the due diligence and improve the use of so-far imperfect AI models in decision making.

Systemic changes

Even when an in-depth due diligence can be run before a software is deployed as well as during its use in order to uncover and fix any potential bias, it would not be enough to render perfect, that is accurate and just decisions. The bias in the collected data has a great chance of reflecting the existing injustice and inequalities in the society, unless a particular attention is paid to such existing cultural and social norms and stereotypes when creating the dataset, and they are purposefully corrected for. Also, the data collection would need to take into consideration a possibility of the data being made «dirty» on purpose, that is when stemming from wrong and immoral law enforcement practices and policies. 

Only if the process of data collection, production and labelling and the use of the algorithm has well-defined rules and it is overseen can it be ensured that all the actors participating in the decision making, be it judge, the clerks or law enforcement authorities, strive for assuring the highest level of accuracy and fairness when it comes to the data and the algorithm. 

If we cannot deploy a bias-free system, an intermediate, potentially temporary solution could be to use AI systems complementary to human decision-making. That way, we can speed up the judicial process, analyze the case facts in a deeper way or save costs by at the same time protecting the justice. Just because a technology is available, it does not mean it should be used.

Suscribe to our newsletter here.

Dejar respuesta

Please enter your comment!
Please enter your name here