featureImportance() R function from [flacco]

Feature Importance Plot

Creates a feature importance plot.


ggplotFeatureImportance(featureList, control = list(), ...)

plotFeatureImportance(featureList, control = list(), ...)

Arguments

featureList: [list]

List of vectors of features. One list element is expected to belong to one resampling iteration / fold.
control: [list]

A list, which stores additional configuration parameters:
- featimp.col_{high/medium/low}: Color of the features, which are used often, sometimes or only a few times.
- featimp.perc_{high/low}: Percentage of the total number of folds, defining when a features, is used often, sometimes or only a few times.
- featimp.las: Alignment of axis labels.
- featimp.lab_{feat/resample}: Axis labels (features and resample iterations).
- featimp.string_angle: Angle for the features on the x-axis.
- featimp.pch_{active/inactive}: Plot symbol of the active and inactive points.
- featimp.col_inactive: Color of the inactive points.
- featimp.col_vertical: Color of the vertical lines.
- featimp.lab_{title/strip}: Label used for the title and/or strip label. These parameters are only relevant for ggplotFeatureImportance.
- featimp.legend_position: Location of the legend. This parameter is only relevant for ggplotFeatureImportance.
- featimp.flip_axes: Should the axes be flipped? This parameter is only relevant for ggplotFeatureImportance.
- featimp.plot_tiles: Visualize (non-)selected features with tiles? This parameter is only relevant for ggplotFeatureImportance.
...: [any]

Further arguments, which can be passed to plot.

Returns

[plot].

Feature Importance Plot, indicating which feature was used during which iteration.

Examples


## Not run:

# At the beginning, one needs a list of features, e.g. derived during a
# nested feature selection within mlr (see the following 8 steps):
library(mlr)
library(mlbench)
data(Glass)

# (1) Create a classification task:
classifTask = makeClassifTask(data = Glass, target = "Type")

# (2) Define the model (here, a classification tree):
lrn = makeLearner(cl = "classif.rpart")

# (3) Define the resampling strategy, which is supposed to be used within 
# each inner loop of the nested feature selection:
innerResampling = makeResampleDesc("Holdout")

# (4) What kind of feature selection approach should be used? Here, we use a
# sequential backward strategy, i.e. starting from a model with all features,
# in each step the feature decreasing the performance measure the least is
# removed from the model:
ctrl = makeFeatSelControlSequential(method = "sbs")

# (5) Wrap the original model (see (2)) in order to allow feature selection:
wrappedLearner = makeFeatSelWrapper(learner = lrn,
  resampling = innerResampling, control = ctrl)

# (6) Define a resampling strategy for the outer loop. This is necessary in
# order to assess whether the selected features depend on the underlying
# fold:
outerResampling = makeResampleDesc(method = "CV", iters = 10L)

# (7) Perform the feature selection:
featselResult = resample(learner = wrappedLearner, task = classifTask,
  resampling = outerResampling, models = TRUE)

# (8) Extract the features, which were selected during each iteration of the
# outer loop (i.e. during each of the 10 folds of the cross-validation):
featureList = lapply(featselResult$models, 
  function(mod) getFeatSelResult(mod)$x)
## End(Not run)

########################################################################

# Now, one could inspect the features manually:
featureList

# Alternatively, one might use visual means such as the feature
# importance plot. There exist two versions for the feature importance
# plot. One based on the classical R figures
plotFeatureImportance(featureList)

# and one using ggplot
ggplotFeatureImportance(featureList)

flacco package Read PDF manual

Maintainer: Pascal Kerschke
License: BSD_2_clause + file LICENSE
Last published: 2020-03-31

Useful links

featureImportance function

Feature Importance Plot

Arguments

Returns

Examples