complete() R function from [rMIDAS]

Impute missing values using imputation model

Having trained an imputation model, complete() produces m completed datasets, saved as a list.


complete(
  mid_obj,
  m = 10L,
  unscale = TRUE,
  bin_label = TRUE,
  cat_coalesce = TRUE,
  fast = FALSE,
  file = NULL,
  file_root = NULL
)

Arguments

mid_obj: Object of class midas, the result of running rMIDAS::train()
m: An integer, the number of completed datasets required
unscale: Boolean, indicating whether to unscale any columns that were previously minmax scaled between 0 and 1
bin_label: Boolean, indicating whether to add back labels for binary columns
cat_coalesce: Boolean, indicating whether to decode the one-hot encoded categorical variables
fast: Boolean, indicating whether to impute category with highest predicted probability (TRUE), or to use predicted probabilities to make weighted sample of category levels (FALSE)
file: Path to save completed datasets. If NULL, completed datasets are only loaded into memory.
file_root: A character string, used as the root for all filenames when saving completed datasets if a filepath is supplied. If no file_root is provided, completed datasets will be saved as "file/midas_impute_yymmdd_hhmmss_m.csv"

Returns

List of length m, each element of which is a completed data.frame (i.e. no missing values)

Details

For more information, see Lall and Robinson (2023): doi:10.18637/jss.v107.i09.

Examples


# Generate raw data, with numeric, binary, and categorical variables
## Not run:

# Run where Python available and configured correctly
if (python_configured()) {
set.seed(89)
n_obs <- 10000
raw_data <- data.table(a = sample(c("red","yellow","blue",NA),n_obs, replace = TRUE),
                       b = 1:n_obs,
                       c = sample(c("YES","NO",NA),n_obs,replace=TRUE),
                       d = runif(n_obs,1,10),
                       e = sample(c("YES","NO"), n_obs, replace = TRUE),
                       f = sample(c("male","female","trans","other",NA), n_obs, replace = TRUE))

# Names of bin./cat. variables
test_bin <- c("c","e")
test_cat <- c("a","f")

# Pre-process data
test_data <- convert(raw_data,
                     bin_cols = test_bin,
                     cat_cols = test_cat,
                     minmax_scale = TRUE)

# Run imputations
test_imp <- train(test_data)

# Generate datasets
complete_datasets <- complete(test_imp, m = 5, fast = FALSE)

# Use Rubin's rules to combine m regression models
midas_pool <- combine(formula = d~a+c+e+f,
                      complete_datasets)
}
## End(Not run)

References

Rdpack::insert_ref(key="rmidas_jss",package="rMIDAS")

rMIDAS package Read PDF manual

Maintainer: Thomas Robinson
License: Apache License (>= 2.0)
Last published: 2023-10-11

Useful links

complete function