marginalize_attr function

Marginalize synthetic attributes

Marginalize synthetic attributes

Marginalize, (ie- reduce in number), attributes of a synthetic dataset of class 'micro_synthetic' or a list of synthetic datasets of class 'synthACS'. This is done by marginalizing the joint distribution based on a set of specified attributes (see Arguments below).

marginalize_attr(obj, varlist, marginalize_out = FALSE)

Arguments

  • obj: An object of class "micro_synthetic".

  • varlist: A character vector of variable, or attribute, names in obj.

  • marginalize_out: Logical. Do you wish to remove the variables in varlist

    instead of keeping them? Defaults to FALSE

Examples

{ # dummy data setup set.seed(567L) df <- data.frame(gender= factor(sample(c("male", "female"), size= 100, replace= TRUE)), age= factor(sample(1:5, size= 100, replace= TRUE)), pov= factor(sample(c("below poverty", "at above poverty"), size= 100, replace= TRUE, prob= c(.15,.85))), p= runif(100)) df$p <- df$p / sum(df$p) class(df) <- c("data.frame", "micro_synthetic") df2 <- marginalize_attr(df, varlist= "gender") df3 <- marginalize_attr(df, varlist= c("gender", "age")) df4 <- marginalize_attr(df, varlist= c("gender", "age"), marginalize_out= TRUE) df_list <- replicate(10, df, simplify= FALSE) dummy_list <- replicate(10, list(NULL), simplify= FALSE) df_list <- mapply(function(a,b) {return(list(a, b))}, a= dummy_list, b= df_list, SIMPLIFY = FALSE) class(df_list) <- c("list", "synthACS") # run the function df_list2 <- marginalize_attr(df_list, varlist= c("gender", "age")) }
  • Maintainer: Alex Whitworth
  • License: MIT + file LICENSE
  • Last published: 2022-10-26

Useful links