Suppose you applied for credit at a bank. In the meeting with the bank representative, you are told that your credit application has been denied and that you are not eligible for the desired credit. Upon requesting an explanation for this denial, you are informed that – based on the huge number of prior credits given by the bank – the representative's intuition tells them that you are not a safe bet. Would you be happy with this explanation? No, you would not. Especially in a context where decisions are based on your personal information, you would demand an appropriate explanation.
Now, suppose that the representative was simply reading the output of some AI algorithm that has been fed with your data and thus does not know why this decision was made. Would that be acceptable? Certainly not! However, until very recently, the application of AI algorithms in many fields has led to this and similar scenarios playing out as described above.
Complex AI algorithms are often trained with personal data and deliver important decisions without the user or even the developer of the algorithm being able to explain how the AI arrived at a result. The information about why an algorithm decides the way it does is often contained in an incredibly complex and, for humans, initially opaque binary yes-no game. Understanding this game in a way that we can comprehend and describe it is an important challenge for users and developers of AI models because AI applications are becoming a bigger part of our everyday lives.
LLMs, legislation, and the need for explainability
Large Language Models (LLMs), such as the infamous ChatGPT from OpenAI or its German counterpart Luminous from Aleph Alpha, are very recent and prominent examples of complex and intractable AI models. These and similar models are based on gigantic mathematical models that are optimized with large amounts of data over months and weeks. The result is the breathtaking language comprehension and production capability that rocketed LLMs into the center of public and commercial attention.
With the nearly endless possibilities opened by these new models, current AI systems are positioned to lead to the disruption of established business processes. However, the lack of a clear explanation of why an AI model gives a certain response to a specific prompt might restrict the number of possible applications. This has been a problem for AI applications in highly regulated industries such as finance and medicine for years, but it will also apply to an increasing number of sectors as soon as the European AI Act takes effect.
With the AI Act, the EU is responding to the technical achievement of these complex AI models as well as the socio-economic challenges that are emerging with it. At the same time, the AI Regulation is intended to set the course for the future of digitalization in Europe and to become an innovation driver for a globally active "AI made in Europe". An integral part of this new legislation is that AI algorithms applied in areas related to personal information and security need to provide traceable decisions to allow safeguarding against unintended biases or misuse of personal data.
XAI – Explainable AI in Large Language Models
This begs the question of how to allow companies to use cutting-edge AI technology in fields like finance, medicine, or human resources while still being compliant with strict and important regulations. The field of research trying to answer that question is Explainable AI or XAI. In XAI, researchers are trying to find a way to trace the decision logic of highly complex models, such as LLMs. The methods used to accomplish this are as varied as the different AI models. Generally, they are categorized into two broad approaches:
- Global explainability attempts to map and interpret an entire model. The aim is to understand the general decision logic of a model.
- Local explainability examines the decision of a model for a specific input to understand how this specific input leads to the corresponding output.
In scenarios like the one sketched above, we are trying to explain why a model gave a specific answer for a specific prompt or person. We are, therefore, interested in local explainability. How would we go about creating this for a large and complex language model? The first intuition is to simply build models to make it easy to trace their answers and avoid this problem altogether. While this is certainly possible for a lot of simpler AI applications, the size and complexity of LLMs prevent this. An alternative way to create explainability for LLMs is to employ perturbation-based methods. The idea here is that if we have the input given to a model and its response, we can start experimenting with different variations of the input and observe the effects this has on the responses.
To implement this strategy, we don’t have to know the precise inner workings of the model and can, to some extent, treat it as a black box. This can be done for LLMs by manipulating the saliency of different parts of the prompt given to them by modifying the attention the model allocates to them. By making certain parts of the prompt more/less salient, we can probe the effects of enhancement or suppression on the produced answer. The result is a clear visual representation of the parts of the prompt that had the largest impact on the response that was given.
Armed with this information, we can give an accounting of what parts of the prompt led the LLM to produce its answer. To put it in practical terms, if an LLM is prompted to finish the text “Hello, my name is Lucas. I like soccer and math. Since the last few years, I have been working on…” and returns “…my degree in computer science” we can empirically show that the male name Lucas and math have a strong impact on the response. As explained above, we do so by experimenting with the saliency of the different words. For instance, when we start to mask math, the response might switch to “…my own game, and I’ve been playing it for a while”. Doing this very often in different combinations grants us the conclusions described before. This solution is integrated into the Aleph Alpha ecosystem and is called AtMan (Attention Manipulation).
Other XAI Methods
Methods that experiment with the input to an AI model are not unique to LLMs. Going back to the credit scoring scenario described above, we could also imagine a model that takes a range of characteristics of an applicant, like age, salary, assets, and marital status, and predicts if a credit should be granted.
By applying a method called SHAP (Shapley Additive exPlanations), we can deduce the decision logic of the AI model without having to know the model itself explicitly. Again, we want to deduce the influence each characteristic of the credit applicant has on the response of the model. SHAP works by permuting a person’s characteristics to analyze how important certain characteristics are for the AI model’s response. Permutation, in this case means that the values for some characteristics are replaced with the values from other persons. By doing this many times in many different combinations, we can estimate the effects each characteristic has on the response of the AI model.
A word of caution is warranted, however. XAI methods like those described here explain the behavior of AI models and do not describe how the characteristics influence the real world. If an AI model has learned a faulty association between a person’s characteristics and creditworthiness from the data it was provided with, this association will also show up in the explanations produced by the XAI method. Nevertheless, if a model performs well and learns correct association XAI methods, it might also help us understand business problems better.
The Future of AI: Explainability, Trustworthiness, and Legislative Demands
In general, it can be said that there are ways to create explanations for the behavior of even the most complex AI models, such as LLMs. These explanations can help to employ AI solutions in a manner compliant with current legislation, such as the GDPR, and future legislation, such as the AI Act. However, XAI can only be part of a broader answer and must work in concert with other aspects of Trustworthy AI, such as data safety, robustness, and control.
Trustworthy AI is the guiding concept behind the AI Act and similar regulations, and companies would do well in establishing processes that standardize the development of AI solutions in a way compliant with these guidelines. Over recent years, AI professionals have pushed more and more for platform solutions that inherently grant these standardizations and could potentially also include applications tracking the explanations generated by XAI approaches. In conclusion, we strongly encourage the development of platforms for AI development, including XAI, within organizations that ease compliance and help companies create innovative AI applications that are not hampered by constant worry about legislation.
Dr. Luca Bruder has been a Senior Data Scientist at Alexander Thamm GmbH since 2021. Luca completed his doctorate in the field of computational neuroscience and gained experience in AI and data science consulting alongside his doctorate. He draws on a wide range of experience in statistics, data analysis and artificial intelligence and leads a large project on the topic of Explainable AI and autonomous driving at Alexander Thamm GmbH. In addition, Luca is the author of several publications in the fields of modelling and neuroscience.
Please note: The opinions expressed in Industry Insights published by dotmagazine are the author’s or interview partner’s own and do not necessarily reflect the view of the publisher, eco – Association of the Internet Industry.