Systematic Literature Review: XAI and Clinical Decision Support

Systematic Literature Review: XAI and Clinical Decision Support

Thomas M. Connolly (DS Partnership, UK), Mario Soflano (Glasgow Caledonian University, UK), and Petros Papadopoulos (University of Strathclyde, UK)
DOI: 10.4018/978-1-6684-5092-5.ch008
OnDemand PDF Download:
No Current Special Offers


Machine learning (ML) applications hold significant promise for innovation within healthcare; however, their full potential has not yet been realised, with limited reports of their clinical and cost benefits in clinical practice. This is due to complex clinical, ethical, and legal questions arising from the lack of understanding about how some ML models operate and come to make decisions. eXplainable AI (XAI) is an approach to help address this problem and make ML models understandable. This chapter reports on a systematic literature review investigating the use of XAI in healthcare within the last six years. Three research questions identified as issues in the literature were examined around how bias was dealt with, which XAI techniques were used, and how the applications were evaluated. Findings show that other than class imbalance and missing values, no other types of bias were accounted for in the shortlisted papers. There were no evaluations of the explainability outputs with clinicians and none of the shortlisted papers used an interventional study or RCT.
Chapter Preview


Chapter 2 provided an introduction to eXplainable AI (XAI). Explainable AI aims to explain the way that AI systems work. At a high-level, two types of models can be distinguished:

  • models that are inherently explainable - simple, transparent and easy to understand, sometimes referred to as white-box or transparent models;

  • models that are black-box in nature and require explanation through separate, replicating (surrogate) models that mimic the behaviour of the original model.

White-box systems include decision trees (DT), decision rules (DR), linear regression (LR), logistic regression (LogR), Generalised Linear Model (GLM), Generalised Additive Model (GAM), Naïve Bayes and K-Nearest Neighbour (KNN). Black-box systems include neural networks (NNs) (including deep, recurrent and convolutional neural nets), support vector machines (SVM), random forests (RF) and ensemble methods that combine and aggregate the results of several different models. There is significant debate in the literature on the definition of an explainability, but Arrieta et al. (2020) define it as “the details and reasons a model gives to make its functioning clear or easy to understand”.

Key Terms in this Chapter

SHapley Additive ExPlanations (SHAP): A post-hoc, mainly model-agnostic, local method that uses concepts from cooperative game theory to define a ‘Shapley value’ for a feature of interest that provides a measurement of its influence on the underlying model’s prediction.

Random Forest (RF): A predictive model built by combining and averaging the results from multiple possibly thousands of decision trees that are trained on random subsets of shared features and training data. It can be used for both classification or regression problems.

Accuracy: The proportion of the total number of predictions that are correct.

Multilayer Perceptron (MLP): A fully connected class of feedforward artificial neural network (ANN).

AdaBoost (Adaptive Boosting): A classifier that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

Support Vector Machine (SVM): A classifier that uses a special type of mapping function to build a divider between two sets of features in a high dimensional space.

Logistic Regression (LogR): A classification algorithm is used to predict a binary outcome based on a set of independent variables.

Area Under the Curve (AUC): AUC turns the ROC curve into a numeric representation of performance for a binary classifier and represents the degree to which the model is capable of distinguishing between classes. For a “random” model the AUC-ROC would be 0.5 and for a “perfect” model it would be 1.

Negative Predictive Value (NPV): The proportion of negative cases that are true negatives rather than false negatives (FN).

Sensitivity or Recall: The proportion of true positive cases that are correctly identified.

Decision Tree (DT): A supervised learning algorithm with rules represented as a hierarchical tree. It can be used for both classification or regression problems.

Deep Learning: The “deep” in deep learning refers to the depth of the layers in a neural network. A neural network that consists of more than three layers, including input and output layers, can be considered a deep learning algorithm.

F1 Score: Represents the harmonic mean of precision and sensitivity in which both are maximised to the largest extent possible, given that one comes at the expense of the other. The higher the score, the better the performance.

Machine Learning (ML): An application of AI that enables systems to learn and improve from experience without being explicitly programmed. Machine learning focuses on developing computer programs that can access data and use it to learn for themselves.

ROC (Receiver Operating Characteristics): A graphical plot of the true positives against the false positives at various threshold settings.

Artificial Neural Network (ANN): Consists of a layer of input nodes and a layer of output nodes, connected by one or more layers of hidden nodes. Input layer nodes pass information to hidden layer nodes by firing activation functions, and hidden layer nodes fire or remain dormant depending on the evidence presented. The hidden layers apply weighting functions to the evidence, and when the value of a particular node or set of nodes in the hidden layer reaches some threshold, a value is passed to one or more nodes in the output layer.

Positive Predictive Value (PPV) or Precision: The proportion of positive cases that are true positives rather than false positives.

Explainable AI (XAI): A suite of machine learning techniques that produce more explainable models, while maintaining a high level of learning performance and enable users to understand, trust and effectively manage AI models.

Clinical Decision Support System (CDSS): A technology and a system architecture that uses medical knowledge with clinical data to provide customised advice for an individual patient's care.

Precision-Recall (PR): A graphical plot of precision against sensitivity to show the trade-off between the two measures for different feature settings.

Local Interpretable Model-Agnostic Explanations (LIME): A post-hoc, model-agnostic, local XAI method for black-box models. It generates the explanation by approximating the ML model using an interpretable one (a linear model or decision tree) by sampling data points at random around an input instance and establishing local feature importance that represents the primary drivers supporting the prediction, weighted by their proximity to the original input instance.

Specificity: The proportion of true negative cases that are correctly identified.

Artificial Intelligence (AI): Systems that think and act like humans; systems that think and act rationally.

Complete Chapter List

Search this Book: