Sequential Conditional (Marginally Optimal) Transport on Probabilistic Graphs for Interpretable Counterfactual Fairness

Authors

Affiliations

Arthur Charpentier

Université du Québec à Montréal

Agathe Fernandes Machado

Université du Québec à Montréal

Ewen Gallic

Aix-Marseille School of Economics, Aix-Marseille Univ.

Published

December 18, 2024

Introduction

This ebook provides the replication codes to the article titled ‘Sequential Conditional (Marginally Optimal) Transport on Probabilistic Graphs for Interpretable Counterfactual Fairness.’

Extended Version of the Paper

An extended version of the paper is available on arXiv: https://arxiv.org/abs/2408.03425

Note

All the codes are written in R, except for multivariate transport where we switch to python.

Abstract

In this paper, we link two existing approaches to derive counterfactuals: adaptations based on a causal graph, as suggested in Plečko and Meinshausen (2020) and optimal transport, as in De Lara et al. (2024). We extend “Knothe’s rearrangement” Bonnotte (2013) and “triangular transport” Zech and Marzouk (2022) to probabilistic graphical models, and use this counterfactual approach, referred to as sequential transport, to discuss fairness at the individual level. After establishing the theoretical foundations of the proposed method, we demonstrate its application through numerical experiments on both synthetic and real datasets.

Keywords: Machine Learning (ML) -> ML: Ethics – Bias, Fairness, Transparency & Privacy

Outline

This ebook is made of fourt parts:

Optimal Transport We provide some background for optimal transport (Chapter 1 Optimal Transport).
Simulations
Using data simulated from bivariate Gaussian distributions in two subgroups of the population (\(S=0\) and \(S=1\)), we illustrate the sequential transport algorithm (Chapter 2 Gaussian Simulations). Then, we demonstrate how this algorithm can be used in an interpretable counterfactual fairness context (Chapter 3 Fast Transport on a Grid with Numerical Covariates). We then present another algorithm which can be used if the covariates are not all numeric (Chapter 4 Faster Algorithm). Lastly, we explore what happens when assuming a wrong DAG (Chapter 5 Wrong Causal Assumptions).
The third part shows an example with real data. The law datatest used as an illustration is first presented (Chapter 6 Data). In this data, the individuals (students) may be part of a protected group (\(S=0\)) or not (\(S=1\)). Then, a GLM model is estimated to predict a binary outcome (Chapter 7 Classifier). We then present three methods to produce counterfactuals from group \(S=0\) to group \(S=1\): fairadapt (Chapter 9 Fairadapt), multivariate optimal transport (Chapter 10 Multivariate Optimal Transport), and sequential transport (Chapter 11 Sequential Transport). A comparison of the results is presented in (Chapter 12 Counterfactuals: comparison).
The fourth part replicates the analysis from the previous part, using the UCI Adult dataset (Chapter 13 Adult Dataset), and the COMPAS dataset (Chapter 14 COMPAS Dataset).

Small Package

We defined some of the functions used in this ebook in a small R package, {seqtransfairness}, which can be downloaded from the github repository associated with the paper.

To install the package:

remotes::install_github(
  repo = "fer-agathe/sequential_transport", subdir = "seqtransfairness"
)

Then, the package can be loaded as follows:

library(seqtransfairness)

Alternatively, if you have downloaded the Github repository, you will find a copy of the package in the seqtransfairness folder. To load the package, the load_all() function from R package {devtools} can be used. This is what is done inside the scripts.

Help Pages

The functions, placed in the seqtransfairness/R/ folder are documented in the package. Hence, once the package is loaded with the load_all() function, the help pages can be accessed (e.g., by typing in the R console: ?seqtransfairness, or ?seq_trans).

Replication Codes

Download the Replication codes (zip file, 314 KB)

The codes to replicate the results displayed in the paper are presented in this ebook. The following structure is adopted:

├ ── data
│    └── law_data.csv
├ ── functions
│    └── utils.R
│    └── graphs.R
├ ── seqtransfairness
├ ── scripts
|    └── 01_optimal_transport.R
|    └── 02_gaussian.R
|    └── 03_03_transport-grid.R
|    └── 04_algorithm-5.R
|    └── 05_wrong-causal-assumptions.R
|    └── 05_wrong-causal-assumptions-ot.py
|    └── 06_real-data-lawschool.R
|    └── 06_real-data-lawschool-ot.py
|    └── 07_real-data-adult.R
|    └── 07_real-data-adult-ot.py
|    └── 08_real-data-compas.R
|    └── 08_read-data-compas-ot.py
|    └── sequential_transport.Rproj
├ ── README.md

To replicate the codes, provided you have installed R and Rstudio on your computer, and have also installed python (for multivariate optimal transport), double click on the following file to open RStudio (so that the correct working directory is set): ./scripts/sequential_transport.Rproj. Then, you can open the scripts from RStudio.

Replication of Figures and Tables

The following Figures from the paper are produced using R:

Figure 4: 04_Algorithm-5.R
Figure 6: 02_gaussian.R
Figure 7: 04_Algorithm-5.R
Figure 9: 06_real-data-lawschool.R
Figure 10 (Appendix): 01_optimal_transport.R
Figure 11 (Appendix): 01_optimal_transport.R
Figure 12 (Appendix): 02_gaussian.R
Figure 13 (Appendix): 02_gaussian.R
Figure 16 (Appendix): 07_real-data-adult.R (left) and 08_real-data-compas.R (right)
Figure 18 (Appendix): 05_wrong-causal-assumptions.R
Figure 19 (Appendix): 05_wrong-causal-assumptions.R

The other figures are tikz pictures showing DAGs.

The following Tables are produced using R:

Table 1: 06_real-data-lawschool.R
Table 2 (Appendix): 07_real-data-adult.R (top) 08_real-data-compas.R (bottom)
Table 3 (Appendix): 06_real-data-lawschool.R