acc_shape_or_scale function

Compare observed versus expected distributions

Compare observed versus expected distributions

This implementation contrasts the empirical distribution of a measurement variables against assumed distributions. The approach is adapted from the idea of rootograms (Tukey 1977) which is also applicable for count data (Kleiber and Zeileis 2016).

Indicator

acc_shape_or_scale( resp_vars, study_data, label_col, item_level = "item_level", dist_col, guess, par1, par2, end_digits, flip_mode = "noflip", meta_data = item_level, meta_data_v2 )

Arguments

  • resp_vars: variable the name of the continuous measurement variable
  • study_data: data.frame the data frame that contains the measurements
  • label_col: variable attribute the name of the column in the metadata with labels of variables
  • item_level: data.frame the data frame that contains metadata attributes of study data
  • dist_col: variable attribute the name of the variable attribute in meta_data that provides the expected distribution of a study variable
  • guess: logical estimate parameters
  • par1: numeric first parameter of the distribution if applicable
  • par2: numeric second parameter of the distribution if applicable
  • end_digits: logical internal use. check for end digits preferences
  • flip_mode: enum default | flip | noflip | auto. Should the plot be in default orientation, flipped, not flipped or auto-flipped. Not all options are always supported. In general, this con be controlled by setting the roptions(dataquieR.flip_mode = ...). If called from dq_report, you can also pass flip_mode to all function calls or set them specifically using specific_args.
  • meta_data: data.frame old name for item_level
  • meta_data_v2: character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED , using prep_purge_data_frame_cache, if you specify meta_data_v2.

Returns

a list with:

  • ResultData: data.frame underlying the plot
  • SummaryPlot: ggplot2::ggplot2 probability distribution plot
  • SummaryTable: data.frame with the columns Variables and FLG_acc_ud_shape

ALGORITHM OF THIS IMPLEMENTATION:

  • This implementation is restricted to data of type float or integer.
  • Missing codes are removed from resp_vars (if defined in the metadata)
  • The user must specify the column of the metadata containing probability distribution (currently only: normal, uniform, gamma)
  • Parameters of each distribution can be estimated from the data or are specified by the user
  • A histogram-like plot contrasts the empirical vs. the technical distribution

See Also

Online Documentation

  • Maintainer: Stephan Struckmann
  • License: BSD_2_clause + file LICENSE
  • Last published: 2025-03-05