Maintaining Coherence in Explainable AI:
Strategies for Consistency Across Time and Interaction

Alan Dix^1,2, Tommaso Turchi³ Ben Wilson², Alessio Malizia^3,4, Anna Monreale³ and Matt Roach²
¹ Cardiff Metropolitan University, Wales, UK ² Computational Foundry, Swansea University, Wales, UK ³ Department of Computer Science, University of Pisa, Pisa, Italy ⁴ Molde University College, Molde, Norway

Presented at SYNERGY Workshop on Hybrid Human-AI Systems at HHAI 2025, Pisa, Italy, 9th June 2025

Download full paper (pre-conference version, PDF, 1.5Mb)

Can we create explanations of artificial intelligence and machine learning that have some level of consistency over time as we might expect of a human explanation? This paper explores this issue, and offers several strategies for either maintaining a level of consistency or highlighting when and why past explanations might appear inconsistent with current decisions.

References

A. Dix, Human issues in the use of pattern recognition techniques, in: R. Beale, J. Finlay (Eds.), Neural Networks and Pattern Recognition in Human Computer Interaction, Ellis Horwood, 1992, pp. 429–451. https://alandix.com/academic/papers/neuro92/.
K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps, arXiv preprint arXiv:1312.6034 (2013). (ICLR 2014, Workshop Poster).
S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Curran Associates Inc., Red Hook, NY, USA, 2017, p. 4768–4777.
M. T. Ribeiro, S. Singh, C. Guestrin, “why should I trust you?”: Explaining the predictions of any classifier, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), Association for Computing Machinery, New York, NY, USA, 2016, p. 1135–1144. doi:10.1145/2939672.2939778.
M. Setzu, R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, F. Giannotti, GLocalX – from local to global explanations of black box AI models, Artificial Intelligence 294 (2021) 103457.
F. Doshi-Velez, B. Kim, Towards a rigorous science of interpretable machine learning, arXiv preprint arXiv:1702.08608 (2017).
R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, D. Pedreschi, A survey of methods for explaining black box models, ACM Computing Surveys 51 (2018) 1–42.
A. Abdul, J. Vermeulen, D. Wang, B. Y. Lim, M. Kankanhalli, Trends and trajectories for explainable, accountable and intelligible systems: An HCI research agenda, in: Proceedings of the 2018 CHI conference on human factors in computing systems, 2018, pp. 1–18.
P. P. Angelov, E. A. Soares, R. Jiang, N. I. Arnold, P. M. Atkinson, Explainable artificial intelligence: an analytical review, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 11 (2021) e1424.
X. Wang, M. Yin, Are explanations helpful? a comparative study of the effects of explanations in ai-assisted decision-making, in: Proceedings of the 26th International Conference on Intelligent User Interfaces, 2021, pp. 318–328.
V. Hassija, V. Chamola, A. Mahapatra, A. Singal, D. Goel, K. Huang, S. Scardapane, I. Spinelli, M. Mahmud, A. Hussain, Interpreting black-box models: a review on explainable artificial intelligence, Cognitive Computation 16 (2024) 45–74. doi:10.1007/s12559-023-10179-8.
H. Vainio-Pekka, M. O.-O. Agbese, M. Jantunen, V. Vakkuri, T. Mikkonen, R. Rousi, P. Abrahamsson, The role of explainable AI in the research field of ai ethics, ACM Transactions on Interactive Intelligent Systems 13 (2023) 1–39. doi:10.1145/3599974.
S. Myers, N. Chater, Mutual understanding initial theory, TANGO Deliverable D1.1, 2024.
S. Myers, N. Chater, Interactive explainability: Black boxes, mutual understanding and what it would really mean for AI systems to be as explainable as people, 2024. doi:10.31234/osf.io/ha37x.
R. Guidotti, A. Monreale, S. Ruggieri, F. Naretto, F. Turini, D. Pedreschi, F. Giannotti, Stable and actionable explanations of black-box models through factual and counterfactual rules, Data Mining and Knowledge Discovery 38 (2024) 2825–2862.
F. Gawantka, F. Just, M. Savelyeva, M. Wappler, J. Lässig, A novel metric for evaluating the stability of XAI explanations, Advances in Science, Technology and Engineering Systems Journal 9 (2024) 133–142. doi:10.25046/aj090113.
F. Mazzoni, R. Guidotti, A. Malizia, A Frank System for Co-Evolutionary Hybrid Decision-Making, in: International Symposium on Intelligent Data Analysis, Springer, 2024, pp. 236–248.
A. Monreale, S. Teso, Cognition-aware explanations for HML, TANGO Deliverable D2.1, 2024.
R. Guidotti, Counterfactual explanations and how to find them: literature review and benchmarking, Data Mining and Knowledge Discovery 38 (2024) 2770–2824.
A. Dix, Interactive querying-locating and discovering information, in: Second Workshop on Information Retrieval and Human Computer Interaction, Glasgow, 11th September 1998, 1998. https://www.alandix.com/academic/papers/IQ98/.
F. Naretto, Explainable AI methods and their interplay with privacy protection, Ph.D. thesis, Scuola Normale Superiore, 2023. https://ricerca.sns.it/handle/11384/133984.
J. Chua, E. Rees, H. Batra, S. R. Bowman, J. Michael, E. Perez, M. Turpin, Bias-augmented consistency training reduces biased reasoning in chain-of-thought, 2024. doi:10.48550/arXiv.2403.05518.
J. Liu, A. Jain, S. Takuri, S. Vege, A. Akalin, K. Zhu, S. O’Brien, V. Sharma, TRUTH DECAY: Quantifying multi-turn sycophancy in language models, 2025. doi:10.48550/arXiv.2503.11656.
Q. Xie, Z. Wang, Y. Feng, R. Xia, Ask again, then fail: Large language models’ vacillations in judgment, 2024. arXiv:2310.02174.
B. Wang, X. Yue, H. Sun, Can ChatGPT defend its belief in truth? evaluating LLM reasoning via debate, 2023. arXiv:2305.13160.
W. Chen, Z. Huang, L. Xie, B. Lin, H. Li, L. Lu, X. Tian, D. Cai, Y. Zhang, W. Wang, X. Shen, J. Ye, From yes-men to truth-tellers: Addressing sycophancy in large language models with pinpoint tuning, 2024. doi:10.48550/arXiv.2409.01658.
A. Fanous, J. Goldberg, A. A. Agarwal, J. Lin, A. Zhou, R. Daneshjou, O. Koyejo, SycEval: Evaluating LLM sycophancy, 2025. doi:10.48550/arXiv.2502.08177.
S. Chern, Z. Hu, Y. Yang, E. Chern, Y. Guo, J. Jin, B. Wang, P. Liu, Behonest: Benchmarking honesty in large language models, 2024. arXiv:2406.13261.
A. Dix, T. Turchi, B. Wilson, A. Monreale, M. Roach, Talking Back – human input and explanations to interactive AI systems, in: Workshop on Adaptive eXplainable AI (AXAI), IUI 2025, Cagliari, Italy, 24th March 2025, 2025. https://alandix.com/academic/papers/AXAI2025-talking-back/.

Full reference:: A. Dix, T. Turchi, B. Wilson, A. Monreale and M. Roach. (2025). Maintaining Coherence in Explainable AI: Strategies for Consistency Across Time and Interaction. SYNERGY – Designing and Building Hybrid Human–AI Systems || Workshop on Adaptive eXplainable AI (AXAI), HHAI 2025, Pisa, Italy, 9th June 2025.
https://alandix.com/academic/papers/
synergy2025-XAI-coherence/

more ...: Download full paper (pre-conference version, PDF, 1.5Mb)
Visit TANGO project

TANGO – it takes two to tango – a synergistic approach to human-machine decision making. An EU Horizon funded project

Figure 1: Types of incoherence
[zoom image]

Figure 3: Ensuring consistency with previous explanations
[zoom image]

Figure 4: Patch model with important issues
[zoom image]

https://alandix.com/academic/papers/synergy2025-XAI-coherence/

Alan Dix 21/4/2025

Maintaining Coherence in Explainable AI: Strategies for Consistency Across Time and Interaction

References

Maintaining Coherence in Explainable AI:
Strategies for Consistency Across Time and Interaction