featureImportance function

Feature Importance Plot

Feature Importance Plot

Creates a feature importance plot.

ggplotFeatureImportance(featureList, control = list(), ...) plotFeatureImportance(featureList, control = list(), ...)

Arguments

  • featureList: [list]

    List of vectors of features. One list element is expected to belong to one resampling iteration / fold.

  • control: [list]

    A list, which stores additional configuration parameters:

    • featimp.col_{high/medium/low}: Color of the features, which are used often, sometimes or only a few times.
    • featimp.perc_{high/low}: Percentage of the total number of folds, defining when a features, is used often, sometimes or only a few times.
    • featimp.las: Alignment of axis labels.
    • featimp.lab_{feat/resample}: Axis labels (features and resample iterations).
    • featimp.string_angle: Angle for the features on the x-axis.
    • featimp.pch_{active/inactive}: Plot symbol of the active and inactive points.
    • featimp.col_inactive: Color of the inactive points.
    • featimp.col_vertical: Color of the vertical lines.
    • featimp.lab_{title/strip}: Label used for the title and/or strip label. These parameters are only relevant for ggplotFeatureImportance.
    • featimp.legend_position: Location of the legend. This parameter is only relevant for ggplotFeatureImportance.
    • featimp.flip_axes: Should the axes be flipped? This parameter is only relevant for ggplotFeatureImportance.
    • featimp.plot_tiles: Visualize (non-)selected features with tiles? This parameter is only relevant for ggplotFeatureImportance.
  • ...: [any]

    Further arguments, which can be passed to plot.

Returns

[plot].

Feature Importance Plot, indicating which feature was used during which iteration.

Examples

## Not run: # At the beginning, one needs a list of features, e.g. derived during a # nested feature selection within mlr (see the following 8 steps): library(mlr) library(mlbench) data(Glass) # (1) Create a classification task: classifTask = makeClassifTask(data = Glass, target = "Type") # (2) Define the model (here, a classification tree): lrn = makeLearner(cl = "classif.rpart") # (3) Define the resampling strategy, which is supposed to be used within # each inner loop of the nested feature selection: innerResampling = makeResampleDesc("Holdout") # (4) What kind of feature selection approach should be used? Here, we use a # sequential backward strategy, i.e. starting from a model with all features, # in each step the feature decreasing the performance measure the least is # removed from the model: ctrl = makeFeatSelControlSequential(method = "sbs") # (5) Wrap the original model (see (2)) in order to allow feature selection: wrappedLearner = makeFeatSelWrapper(learner = lrn, resampling = innerResampling, control = ctrl) # (6) Define a resampling strategy for the outer loop. This is necessary in # order to assess whether the selected features depend on the underlying # fold: outerResampling = makeResampleDesc(method = "CV", iters = 10L) # (7) Perform the feature selection: featselResult = resample(learner = wrappedLearner, task = classifTask, resampling = outerResampling, models = TRUE) # (8) Extract the features, which were selected during each iteration of the # outer loop (i.e. during each of the 10 folds of the cross-validation): featureList = lapply(featselResult$models, function(mod) getFeatSelResult(mod)$x) ## End(Not run) ######################################################################## # Now, one could inspect the features manually: featureList # Alternatively, one might use visual means such as the feature # importance plot. There exist two versions for the feature importance # plot. One based on the classical R figures plotFeatureImportance(featureList) # and one using ggplot ggplotFeatureImportance(featureList)
  • Maintainer: Pascal Kerschke
  • License: BSD_2_clause + file LICENSE
  • Last published: 2020-03-31