Sequential Conditional (Marginally Optimal) Transport on Probabilistic Graphs for Interpretable Counterfactual Fairness
Introduction
This ebook provides the replication codes to the project titled ‘Sequential Conditional (Marginally Optimal) Transport on Probabilistic Graphs for Interpretable Counterfactual Fairness.’
All the codes are written in R, except for multivariate transport where we switch to python.
Abstract
In this paper, we link two existing approaches to derive counterfactuals: adaptations based on a causal graph, as suggested in Plečko and Meinshausen (2020)} and optimal transport, as in De Lara et al. (2024). We extend “Knothe’s rearrangement” Bonnotte (2013) and “triangular transport” Zech and Marzouk (2022) to probabilistic graphical models, and use this counterfactual approach, referred to as sequential transport, to discuss individual fairness. After establishing the theoretical foundations of the proposed method, we demonstrate its application through numerical experiments on both synthetic and real datasets.
Keywords: Machine Learning (ML) -> ML: Ethics – Bias, Fairness, Transparency & Privacy
Outline
This ebook is made of three parts:
- Optimal Transport We provide some background for optimal transport (Chapter 1 Optimal Transport).
- Simulations
Using data simulated from bivariate Gaussian distributions in two subgroups of the population (\(S=0\) and \(S=1\)), we illustrate the sequential transport algorithm (Chapter 2 Gaussian Simulations). Then, we demonstrate how this algorithm can be used in an interpretable counterfactual fairness context (Chapter 3 Regression). - The last part shows an example with real data. The law datatest used as an illustration is first presented (Chapter 4 Data). In this data, the individuals (students) may be part of a protected group (\(S=0\)) or not (\(S=1\)). Then, a GLM model is estimated to predict a binary outcome (Chapter 5 Classifier). We then present three methods to produce counterfactuals from group \(S=0\) to group \(S=1\): fairadapt (Chapter 6 Fairadapt), multivariate optimal transport (Chapter 7 Multivariate Optimal Transport), and sequential transport (Chapter 8 Sequential Transport). A comparison of the results is presented in (Chapter 9 Counterfactuals: comparison).
Replication Codes
The codes to replicate the results displayed in the paper are presented in this ebook. The following structure is adopted:
Supplementary-materials
├ ── replication_book
│ └── index.html
│ └── ...
├ ── data
├ ── functions
│ └── utils.R
│ └── graphs.R
├ ── scripts
| └── 01_optimal_transport.R
| └── 02_gaussian.R
| └── 03_regression.R
| └── 04_1_law_data.R
| └── 04_2_law_classifier.R
| └── 04_3_law_fairadapt.R
| └── 04_4a_law_optimal_transport.R
| └── 04_4b_law_optimal_transport.R
| └── 04_5_law_sequential_transport.R
| └── 04_6_comparison.R
| └── sequential_transport.Rproj
To replicate the codes, provided you have installed R and Rstudio on your computer, double click on the following file to open RStudio (so that the correct working directory is set): Supplementary-materials/scripts/sequential_transport.Rproj
. Then, the scripts can be launched.