Having trained an imputation model, complete() produces m completed datasets, saved as a list.
complete( mid_obj, m =10L, unscale =TRUE, bin_label =TRUE, cat_coalesce =TRUE, fast =FALSE, file =NULL, file_root =NULL)
Arguments
mid_obj: Object of class midas, the result of running rMIDAS::train()
m: An integer, the number of completed datasets required
unscale: Boolean, indicating whether to unscale any columns that were previously minmax scaled between 0 and 1
bin_label: Boolean, indicating whether to add back labels for binary columns
cat_coalesce: Boolean, indicating whether to decode the one-hot encoded categorical variables
fast: Boolean, indicating whether to impute category with highest predicted probability (TRUE), or to use predicted probabilities to make weighted sample of category levels (FALSE)
file: Path to save completed datasets. If NULL, completed datasets are only loaded into memory.
file_root: A character string, used as the root for all filenames when saving completed datasets if a filepath is supplied. If no file_root is provided, completed datasets will be saved as "file/midas_impute_yymmdd_hhmmss_m.csv"
Returns
List of length m, each element of which is a completed data.frame (i.e. no missing values)
# Generate raw data, with numeric, binary, and categorical variables## Not run:# Run where Python available and configured correctlyif(python_configured()){set.seed(89)n_obs <-10000raw_data <- data.table(a = sample(c("red","yellow","blue",NA),n_obs, replace =TRUE), b =1:n_obs, c = sample(c("YES","NO",NA),n_obs,replace=TRUE), d = runif(n_obs,1,10), e = sample(c("YES","NO"), n_obs, replace =TRUE), f = sample(c("male","female","trans","other",NA), n_obs, replace =TRUE))# Names of bin./cat. variablestest_bin <- c("c","e")test_cat <- c("a","f")# Pre-process datatest_data <- convert(raw_data, bin_cols = test_bin, cat_cols = test_cat, minmax_scale =TRUE)# Run imputationstest_imp <- train(test_data)# Generate datasetscomplete_datasets <- complete(test_imp, m =5, fast =FALSE)# Use Rubin's rules to combine m regression modelsmidas_pool <- combine(formula = d~a+c+e+f, complete_datasets)}## End(Not run)