complete function

Impute missing values using imputation model

Impute missing values using imputation model

Having trained an imputation model, complete() produces m completed datasets, saved as a list.

complete( mid_obj, m = 10L, unscale = TRUE, bin_label = TRUE, cat_coalesce = TRUE, fast = FALSE, file = NULL, file_root = NULL )

Arguments

  • mid_obj: Object of class midas, the result of running rMIDAS::train()
  • m: An integer, the number of completed datasets required
  • unscale: Boolean, indicating whether to unscale any columns that were previously minmax scaled between 0 and 1
  • bin_label: Boolean, indicating whether to add back labels for binary columns
  • cat_coalesce: Boolean, indicating whether to decode the one-hot encoded categorical variables
  • fast: Boolean, indicating whether to impute category with highest predicted probability (TRUE), or to use predicted probabilities to make weighted sample of category levels (FALSE)
  • file: Path to save completed datasets. If NULL, completed datasets are only loaded into memory.
  • file_root: A character string, used as the root for all filenames when saving completed datasets if a filepath is supplied. If no file_root is provided, completed datasets will be saved as "file/midas_impute_yymmdd_hhmmss_m.csv"

Returns

List of length m, each element of which is a completed data.frame (i.e. no missing values)

Details

For more information, see Lall and Robinson (2023): doi:10.18637/jss.v107.i09.

Examples

# Generate raw data, with numeric, binary, and categorical variables ## Not run: # Run where Python available and configured correctly if (python_configured()) { set.seed(89) n_obs <- 10000 raw_data <- data.table(a = sample(c("red","yellow","blue",NA),n_obs, replace = TRUE), b = 1:n_obs, c = sample(c("YES","NO",NA),n_obs,replace=TRUE), d = runif(n_obs,1,10), e = sample(c("YES","NO"), n_obs, replace = TRUE), f = sample(c("male","female","trans","other",NA), n_obs, replace = TRUE)) # Names of bin./cat. variables test_bin <- c("c","e") test_cat <- c("a","f") # Pre-process data test_data <- convert(raw_data, bin_cols = test_bin, cat_cols = test_cat, minmax_scale = TRUE) # Run imputations test_imp <- train(test_data) # Generate datasets complete_datasets <- complete(test_imp, m = 5, fast = FALSE) # Use Rubin's rules to combine m regression models midas_pool <- combine(formula = d~a+c+e+f, complete_datasets) } ## End(Not run)

References

Rdpack::insert_ref(key="rmidas_jss",package="rMIDAS")

  • Maintainer: Thomas Robinson
  • License: Apache License (>= 2.0)
  • Last published: 2023-10-11