acc_distributions_prop function

Plots and checks for distributions -- Proportion

Plots and checks for distributions -- Proportion

Data quality indicator checks "Unexpected location" and "Unexpected proportion" with histograms.

Indicator

acc_distributions_prop( resp_vars = NULL, study_data, label_col, item_level = "item_level", check_param = "proportion", plot_ranges = TRUE, flip_mode = "noflip", meta_data = item_level, meta_data_v2 )

Arguments

  • resp_vars: variable list the names of the measurement variables
  • study_data: data.frame the data frame that contains the measurements
  • label_col: variable attribute the name of the column in the metadata with labels of variables
  • item_level: data.frame the data frame that contains metadata attributes of study data
  • check_param: enum any | location | proportion. Which type of check should be conducted (if possible): a check on the location of the mean or median value of the study data, a check on proportions of categories, or either of them if the necessary metadata is available.
  • plot_ranges: logical Should the plot show ranges and results from the data quality checks? (default: TRUE)
  • flip_mode: enum default | flip | noflip | auto. Should the plot be in default orientation, flipped, not flipped or auto-flipped. Not all options are always supported. In general, this con be controlled by setting the roptions(dataquieR.flip_mode = ...). If called from dq_report, you can also pass flip_mode to all function calls or set them specifically using specific_args.
  • meta_data: data.frame old name for item_level
  • meta_data_v2: character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED , using prep_purge_data_frame_cache, if you specify meta_data_v2.

Returns

A list with:

  • SummaryTable: data.frame containing data quality checks for "Unexpected location" (FLG_acc_ud_loc) and "Unexpected proportion" (FLG_acc_ud_prop) for each response variable in resp_vars.
  • SummaryData: a data.frame containing data quality checks for "Unexpected location" and / or "Unexpected proportion" for a report
  • SummaryPlotList: list of ggplot2::ggplot s for each response variable in resp_vars.

Algorithm of this implementation:

  • If no response variable is defined, select all variables of type float or integer in the study data.
  • Remove missing codes from the study data (if defined in the metadata).
  • Remove measurements deviating from (hard) limits defined in the metadata (if defined).
  • Exclude variables containing only NA or only one unique value (excluding NAs).
  • Perform check for "Unexpected location" if defined in the metadata (needs a LOCATION_METRIC (mean or median) and LOCATION_RANGE (range of expected values for the mean and median, respectively)).
  • Perform check for "Unexpected proportion" if defined in the metadata (needs PROPORTION_RANGE (range of expected values for the proportions of the categories)).
  • Plot histogram(s).

See Also

  • Maintainer: Stephan Struckmann
  • License: BSD_2_clause + file LICENSE
  • Last published: 2025-03-05