7 Objectives
The chapters of this part of the ebook present the following methods to build counterfactuals for a categorical variable:
- Using matching (Chapter 11),
- Using optimal transport on label-encoded variables (Chapter 9),
- Using optimal transport on the simplex (Chapter 10).
\[ \definecolor{wongBlack}{RGB}{0,0,0} \definecolor{wongGold}{RGB}{230, 159, 0} \definecolor{wongLightBlue}{RGB}{86, 180, 233} \definecolor{wongGreen}{RGB}{0, 158, 115} \definecolor{wongYellow}{RGB}{240, 228, 66} \definecolor{wongBlue}{RGB}{0, 114, 178} \definecolor{wongOrange}{RGB}{213, 94, 0} \definecolor{wongPurple}{RGB}{204, 121, 167} %\definecolor{colA}{RGB}{255, 221, 85} %\definecolor{colB}{RGB}{148, 78, 223} %\definecolor{colC}{RGB}{63, 179, 178} \definecolor{colA}{RGB}{0, 114, 178} \definecolor{colB}{RGB}{213, 94, 0} \definecolor{colC}{RGB}{204, 121, 167} \definecolor{colGpeZero}{RGB}{0,160,138} \definecolor{colGpeUn}{RGB}{242, 173, 0} \]
Consider a categorical variables \(x \in \{{\color{colA}A}, {\color{colB}B}, {\color{colC}C}\}\), with group-specific distributions. The categorical variable could be, for example, the treatment administered for a disease: A=surgery, B=medication, C=no treatment. We consider two groups (this could non-Black/Black, for example). We want to build a counterfactual category for each individual from a group to the other.
Let Group 0 and Group 1 represent two subpopulations in which the distribution of \(x\) differs. For example, we can have:
- in Group 0 it is \(\boldsymbol{p}_0 = (0.1, 0.5, 0.4)\).
- in Group 1, the category distribution is \(\boldsymbol{p}_1 = (0.5, 0.3, 0.2)\).
Group 0 could be, for example, Black individuals, whereas Group 1 could be individuals who are not Black.
The objective is to define, for each individual in Group 0, a counterfactual category that reflects the distributional characteristics of Group 1. That is, we want to know what would be the medical treatment (surgery, medication, no treatment) of a Black individual that received a given treatment (e.g., “C=no treatment) had they been non-Black.
To do so, the remainder of this part of the ebook revisits two approaches:
- Using matching (Chapter 11),
- Using optimal transport on label-encoded variables (Chapter 9),
- Using transport on the simplex (Chapter 10) (the one used in our paper).