Sequential Conditional (Marginally Optimal) Transport on Probabilistic Graphs for Interpretable Counterfactual Fairness

Authors
Affiliations

Arthur Charpentier

Université du Québec à Montréal

Agathe Fernandes Machado

Université du Québec à Montréal

Ewen Gallic

Aix-Marseille School of Economics, Aix-Marseille Univ.

Published

August 6, 2024

Introduction

This ebook provides the replication codes to the project titled ‘Sequential Conditional (Marginally Optimal) Transport on Probabilistic Graphs for Interpretable Counterfactual Fairness.’

Note

All the codes are written in R, except for multivariate transport where we switch to python.

Abstract

In this paper, we link two existing approaches to derive counterfactuals: adaptations based on a causal graph, as suggested in Plečko and Meinshausen (2020)} and optimal transport, as in De Lara et al. (2024). We extend “Knothe’s rearrangement” Bonnotte (2013) and “triangular transport” Zech and Marzouk (2022) to probabilistic graphical models, and use this counterfactual approach, referred to as sequential transport, to discuss individual fairness. After establishing the theoretical foundations of the proposed method, we demonstrate its application through numerical experiments on both synthetic and real datasets.

Keywords: Machine Learning (ML) -> ML: Ethics – Bias, Fairness, Transparency & Privacy

Outline

This ebook is made of three parts:

  1. Optimal Transport We provide some background for optimal transport (Chapter 1  Optimal Transport).
  2. Simulations
    Using data simulated from bivariate Gaussian distributions in two subgroups of the population (\(S=0\) and \(S=1\)), we illustrate the sequential transport algorithm (Chapter 2  Gaussian Simulations). Then, we demonstrate how this algorithm can be used in an interpretable counterfactual fairness context (Chapter 3  Regression).
  3. The last part shows an example with real data. The law datatest used as an illustration is first presented (Chapter 4  Data). In this data, the individuals (students) may be part of a protected group (\(S=0\)) or not (\(S=1\)). Then, a GLM model is estimated to predict a binary outcome (Chapter 5  Classifier). We then present three methods to produce counterfactuals from group \(S=0\) to group \(S=1\): fairadapt (Chapter 6  Fairadapt), multivariate optimal transport (Chapter 7  Multivariate Optimal Transport), and sequential transport (Chapter 8  Sequential Transport). A comparison of the results is presented in (Chapter 9  Counterfactuals: comparison).

Replication Codes

The codes to replicate the results displayed in the paper are presented in this ebook. The following structure is adopted:

Supplementary-materials
├ ── replication_book
│    └── index.html
│    └── ...
├ ── data
├ ── functions
│    └── utils.R
│    └── graphs.R
├ ── scripts
|    └── 01_optimal_transport.R
|    └── 02_gaussian.R
|    └── 03_regression.R
|    └── 04_1_law_data.R
|    └── 04_2_law_classifier.R
|    └── 04_3_law_fairadapt.R
|    └── 04_4a_law_optimal_transport.R
|    └── 04_4b_law_optimal_transport.R
|    └── 04_5_law_sequential_transport.R
|    └── 04_6_comparison.R
|    └── sequential_transport.Rproj

To replicate the codes, provided you have installed R and Rstudio on your computer, double click on the following file to open RStudio (so that the correct working directory is set): Supplementary-materials/scripts/sequential_transport.Rproj. Then, the scripts can be launched.