This R-package computes the Conditional Permutation Importance (CPI; Strobl, 2008) using an alternative implementation that is both faster and more stable (Debeer & Strobl 2020). The (C)PI can be computed for random forest fit using (a) the original impurity reduction method ( randomForest
-package), and (b) using the Conditional Inference framework (party
-package). In addition, a plotting method for the resulting VarImp
-object is included.
Installation
The package can be installed using using the devtools
-package:
install.packages("devtools")
devtools::install_github("ddebeer/permimp")
Documentation
The workhorse is the permimp
-function.
?permimp
For documentation about the plotting function:
{?plot.VarImp} ?plot.VarImp
Example
library(party)
library(randomForest)
library(permimp)
### set seed
set.seed(542863)
### get example data
airq <- subset(airquality, !(is.na(Ozone) | is.na(Solar.R)))
### fit a random forest
### ... using the party package
cfAirq5 <- cforest(Ozone ~ ., data = airq,
control = cforest_unbiased(mtry = 3, ntree = 1000,
minbucket = 5,
minsplit = 10))
### compute the conditional permutation importance
permimp_cf <- permimp(cfAirq5, conditional = TRUE)
plot(permimp_cf, type = "box", interval = "quantile")
### fit a random forest ...
### ... using the randomForest package
rfAirq5 <- randomForest(Ozone ~ ., data = airq,
mtry = 3, ntree = 1000, importance = TRUE,
keep.forest = TRUE, keep.inbag = TRUE)
### compute the conditional permutation importance
permimp_rf <- permimp(rfAirq5, conditional = TRUE)
plot(permimp_rf, horizontal = TRUE)
Parallel Processing
For forests with large trees parallel processing may speed up the computations. Parallel processing is possible via the cl
argument. Under the hood, the pblapply
function from the pbapply-package.
Tip: when using parallel processing set progressBar = FALSE
. The additional communication between the nodes for updating the progress bar will slow down the computations.