── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(randomForest)
randomForest 4.7-1.1
Type rfNews() to see new features/changes/bug fixes.
Attaching package: 'randomForest'
The following object is masked from 'package:dplyr':
combine
The following object is masked from 'package:ggplot2':
margin
library(future)library(binom)library(locfit)
locfit 1.5-9.9 2024-03-01
Attaching package: 'locfit'
The following object is masked from 'package:purrr':
none
library(pROC)
Type 'citation("pROC")' for a citation.
Attaching package: 'pROC'
The following objects are masked from 'package:stats':
cov, smooth, var
In Chapter 7, we trained 200 random forest classifiers and 200 random forest regressors. Each replication corresponded to a different split of the data into a calibration and a test set. Note that the forests were all estimated on the same training test.
In this chapter, we present the calibration and goodness-of-fit metrics computed during the simulations.
# A tibble: 5,600 × 17
sample model mse accuracy missclass_rate sensitivity specificity threshold
<fct> <fct> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Calibr… Regr… NA 0.852 0.148 0.758 0.923 0.5
2 Test Regr… NA 0.852 0.148 0.755 0.923 0.5
3 Calibr… Regr… NA 0.860 0.140 0.804 0.902 0.5
4 Calibr… Regr… NA 0.861 0.139 0.812 0.897 0.5
5 Calibr… Regr… NA 0.860 0.140 0.825 0.887 0.5
6 Calibr… Regr… NA 0.860 0.140 0.813 0.895 0.5
7 Calibr… Regr… NA 0.860 0.140 0.822 0.890 0.5
8 Calibr… Regr… NA 0.860 0.140 0.818 0.892 0.5
9 Test Regr… NA 0.857 0.143 0.794 0.903 0.5
10 Test Regr… NA 0.858 0.142 0.803 0.899 0.5
# ℹ 5,590 more rows
# ℹ 9 more variables: FPR <dbl>, AUC <dbl>, locfit_score <dbl>, brier <dbl>,
# ece <dbl>, qmse <dbl>, wmse <dbl>, seed <int>, method <fct>
8.3 Boxplots
Let us define two functions to display the goodness-of-fit metrics (boxplot_metrics_gof()) and the calibration metrics (boxplot_metrics_calib()) in boxplots.
We create a function to plot the calibration metrics vs goodness of fit metrics. Each point represent the result obtained in a simulation, computed either in the train set or in the test set.
Now, we will partition the [0,1] segment into equal-sized bins. In each bin, we will compute the number of observations with scores falling into that bin, for each sample of each simulation (and each recalibration technique).
8.5.0.1 Helper Functions
First, the count_scores() function, which counts the number of observations in each bin, for a vector of scores.
#' For a vector of scores, compute the number of obs in each bin#' The bins are defined on evenly separated values in [0,1]#' count_scores <-function(scores) { breaks <-seq(0, 1, by = .05)if (!is.null(scores)) { n_bins <-table(cut(scores, breaks = breaks)) } else { n_bins <-NA_integer_ }tibble(bins =names(table(cut(breaks, breaks = breaks))),n_bins =as.vector(n_bins) )}
Then, another function, count_scores_simul(), applies count_scores() to all the non-recalibrated scores in the simulations.
#' Applies the count_scores() function to a list of simulations#' count_scores_simul <-function(scores_simul) {map(scores_simul, count_scores) |>list_rbind() |>group_by(bins) |>summarise(n_bins =mean(n_bins) )}
Lastly, the count_scores_simul_method() function does the same for the all the recalibrated scores.
#' Extract the recalibrated scores from a list of simulations and computes#' the number of scores in bins (see count_scores())#' For each simulation, there are multiple correction techniques#' count_scores_simul_method <-function(scores_simul_methods) {map(scores_simul_methods, "scores_c") |>map(.f =function(simul){map( simul, .f =function(sim_method) {count_scores_simul(sim_method$tb_score_c_train) |>mutate(sample ="train") |>bind_rows(count_scores_simul(sim_method$tb_score_c_calib) |>mutate(sample ="calibration") ) |>bind_rows(count_scores_simul(sim_method$tb_score_c_test) |>mutate(sample ="test") ) } ) |>list_rbind(names_to ="method") },.progress =TRUE ) |>list_rbind() |>group_by(method, sample, bins) |>summarise(n_bins =mean(n_bins),.groups ="drop" )}
The plot_calib_locfit_simuls() allows us to visualize the calibration curves (on the train set and on the test set) for a type of forests (regression or classification), for uncalibrated scores and recalibrated ones.
plot_calib_locfit_simuls <-function(calib_curve, type) { tb_calib_curve <- calib_curves |>filter(type ==!!type) tb_n_bins <- n_bins |>filter(type ==!!type) nb_methods <- tb_calib_curve$method |>unique() |>length() mat_init <-c(1, 1, 2, 4, 3, 5) |>matrix(ncol =2, byrow =TRUE) mat <- mat_initfor (j in2:(ceiling(nb_methods/2))) { mat <-rbind(mat, mat_init + (5*(j-1))) } mat <-cbind(mat, mat +max(mat))layout(mat, height =rep(c(.35, 1, 3), nb_methods)) y_lim <-c(0, 1) samples <-c("Calibration", "Test")for (i_method in1:length(methods)) { method <- methods_labs[i_method]# Title for the modelspar(mar =c(0, 0.5, 0, 0.5))plot(0:1, 0:1,col =NULL,type="n",xlim =c(0,2), ylim =0:1,xlab ="",ylab ="",main ="",xaxt ="n",yaxt ="n",new =TRUE,frame.plot =FALSE )# Get the center coordinates of the plot plot_center <-c(mean(par("usr")[1:2]), mean(par("usr")[3:4]))# Get the width and height of the text text_dim <-strwidth(method, cex =1.2, font =2)# Calculate the coordinates to center the text text_x <- plot_center[1] - text_dim /2 text_y <- plot_center[2]# Add text to the plot at the centered coordinatestext(1, text_y, method, col = colours_calib[i_method],cex =1, font =2 )for(i_sample in1:length(samples)) { sample <- samples[i_sample] tb_plot <- tb_calib_curve |>filter( sample ==!!sample, method ==!!method, type ==!!type ) n_bins_current <- tb_n_bins |>filter( sample ==!!sample, method ==!!method, type ==!!type ) n_bins_no_calib_current <- tb_n_bins |>filter( sample ==!!sample, method =="No Calibration", type ==!!type )# Barplots on toppar(mar =c(0.5, 4.3, 1.0, 0.5)) y_lim_bp <-range(c(n_bins_no_calib_current$n_bins, n_bins_current$n_bins, na.rm =TRUE) )barplot( n_bins_no_calib_current$n_bins,col =adjustcolor("gray", alpha.f = .3),ylim = y_lim_bp,border ="white",axes =FALSE,xlab ="", ylab ="", main ="" )barplot( n_bins_current$n_bins,col =adjustcolor("#0072B2", alpha.f = .3),ylim = y_lim_bp,border ="white",axes =FALSE,xlab ="", ylab ="", main ="",add =TRUE )# Calib curvepar(mar =c(4.1, 4.3, 0.5, 0.5), mgp =c(2, 1, 0))plot( tb_plot$xlim, tb_plot$mean,type ="l", col = colours_samples[sample],xlim =0:1, ylim =0:1,xlab = latex2exp::TeX("$p^u$"), ylab = latex2exp::TeX("$E(D | p^u = p^c)$"),main ="" )polygon(c(tb_plot$xlim, rev(tb_plot$xlim)),c(tb_plot$lower, rev(tb_plot$upper)),col =adjustcolor(col = colours_samples[sample], alpha.f = .4),border =NA )segments(0, 0, 1, 1, col ="black", lty =2) } }}
plot_calib_locfit_simuls(calib_curve = calib_curves, type ="Regression")
plot_calib_locfit_simuls(calib_curve = calib_curves, type ="Classification")