impute_features() R function from [Rforestry]

Feature imputation using random forests neigborhoods

This function uses the neighborhoods implied by a random forest to impute missing features. The neighbors of a data point are all the training points assigned to the same leaf in at least one tree in the forest. The weight of each neighbor is the fraction of trees in the forest for which it was assigned to the same leaf. We impute a missing features for a point by computing the average, using neighborhoods weights, for all of the point's neighbors.


impute_features(
  object,
  feature.new,
  seed = round(runif(1) * 10000),
  use_mean_imputation_fallback = FALSE
)

Arguments

object: an object of class forestry
feature.new: the feature data.frame we will impute
seed: a random seed passed to the predict method of forestry
use_mean_imputation_fallback: if TRUE, mean imputation (for numeric variables) and mode imputation (for factor variables) is used for missing features for which all neighbors also had the corresponding feature missing; if FALSE these missing features remain as NAs in the data frame returned by impute_features.

Returns

A data.frame that is feature.new with imputed missing values.

Examples


iris_with_missing <- iris
idx_miss_factor <- sample(nrow(iris), 25, replace = TRUE)
iris_with_missing[idx_miss_factor, 5] <- NA
idx_miss_numeric <- sample(nrow(iris), 25, replace = TRUE)
iris_with_missing[idx_miss_numeric, 3] <- NA

x <- iris_with_missing[,-1]
y <- iris_with_missing[, 1]

forest <- forestry(x, y, ntree = 500, seed = 2)
imputed_x <- impute_features(forest, x, seed = 2)

Rforestry package Read PDF manual

Maintainer: Theo Saarinen
License: GPL (>= 3) | file LICENSE
Last published: 2025-03-15

Useful links

impute_features function

Feature imputation using random forests neigborhoods

Arguments

Returns

Examples