convert_df() R function from [metaConvert]

Automatically compute effect sizes from a well formatted dataset


convert_df(
  x,
  measure = c("d", "g", "md", "logor", "logrr", "logirr", "nnt", "r", "z", "logvr",
    "logcvr"),
  main_es = TRUE,
  es_selected = c("auto", "hierarchy", "minimum", "maximum"),
  selection_auto = c("crude", "paired", "adjusted"),
  split_adjusted = TRUE,
  format_adjusted = c("wide", "long"),
  verbose = TRUE,
  max_asymmetry = 10,
  hierarchy = "means_sd > means_se > means_ci",
  table_2x2_to_cor = "tetrachoric",
  rr_to_or = "metaumbrella",
  or_to_rr = "metaumbrella_cases",
  or_to_cor = "bonett",
  smd_to_cor = "viechtbauer",
  pre_post_to_smd = "bonett",
  r_pre_post = 0.5,
  cor_to_smd = "viechtbauer",
  unit_type = "raw_scale",
  yates_chisq = FALSE
)

Arguments

x: a well formatted dataset
measure: the effect size measure that will be estimated from the information stored in the dataset. See details.
main_es: a logical variable indicating whether a main effect size should be selected when overlapping data are present. See details.
es_selected: the method used to select the main effect size when several information allows to estimate an effect size for the same association/comparison. Must be either "minimum" (the smallest effect size will be selected), "maximum" (the largest effect size will be selected) or "hierarchy" (the effect size computed from the information specified highest in the hierarchy will be selected). See details.
selection_auto: a character string giving details on the best "auto" hierarchy to use (only useful when hierarchy="auto" and measure= "d", "g" or "md"). See details.
split_adjusted: a logical value indicating whether crude and adjusted effect sizes should be presented separately. See details.
format_adjusted: presentation format of the adjusted effect sizes. See details.
verbose: a logical variable indicating whether text outputs and messages should be generated. We recommend turning this option to FALSE only after having carefully read all the generated messages.
max_asymmetry: A percentage indicating the tolerance before detecting asymmetry in the 95% CI bounds.
hierarchy: a character string indicating the hierarchy in the information to be prioritized for the effect size calculations. See details.
table_2x2_to_cor: formula used to obtain a correlation coefficient from the contingency table. For now only 'tetrachoric' is available.
rr_to_or: formula used to convert the rr value into an odds ratio.
or_to_rr: formula used to convert the or value into a risk ratio.
or_to_cor: formula used to convert the or value into a correlation coefficient.
smd_to_cor: formula used to convert the cohen_d value into a coefficient correlation.
pre_post_to_smd: formula used to obtain a SMD from pre/post means and SD of two independent groups.
r_pre_post: pre-post correlation across the two groups (use this argument only if the precise correlation in each group is unknown)
cor_to_smd: formula used to convert a correlation coefficient value into a SMD.
unit_type: the type of unit for the unit_increase_iv argument. Must be either "sd" or "value" (see es_from_pearson_r).
yates_chisq: a logical value indicating whether the Chi square has been performed using Yate's correction for continuity.

Returns

The convert_df() function returns a list of more than 70 dataframes (one for each function automatically applied to the dataset). These dataframes systematically contain the columns described in metaConvert-package. The list of dataframes can be easily converted to a single, calculations-ready dataframe using the summary function (see summary.metaConvert).

Details

This function automatically computes or converts between 11 effect sizes measures from any relevant type of input data stored in the dataset you pass to this function.

Effect size measures

Possible effect size measures are:

Cohen's d ("d")
Hedges' g ("g")
mean difference ("md")
(log) odds ratio ("or" and "logor")
(log) risk ratio ("rr" and "logrr")
(log) incidence rate ratio ("irr" and "logirr")
correlation coefficient ("r")
transformed r-to-z correlation coefficient ("z")
log variability ratio ("logvr")
log coefficient of variation ("logcvr")
number needed to treat ("nnt")

Computation of a main effect size

If you enter multiple types of input data (e.g., means/sd of two groups and a student t-test value) for the same comparison i.e., for the same row of the dataset, the convert_df() function can have two behaviours. If you set:

main_es = FALSE the function will estimate all possible effect sizes from all types of input data (which implies that if a comparison has several types of input data , it will result in multiple rows in the dataframe returned by the function)
main_es = TRUE the function will select one effect size per comparison (which implies that if a comparison has several types of input data , it will result in a unique row in the dataframe returned by the function)

Selection of input data for the computation of the main effect size

If you choose to estimate one main effect size (i.e., by setting main_es = TRUE), you have several options to select this main effect size. If you set:

es_selected = "auto": the main effect size will be automatically selected, by prioritizing specific types of input data over other (see next section "Hierarchy").
es_selected = "hierarchy": the main effect size will be selected, by prioritizing specific types of input data over other (see next section "Hierarchy").
es_selected = "minimum": the main effect size will be selected, by selecting the lowest effect size available.
es_selected = "maximum": the main effect size will be selected, by selecting the highest effect size available.

Hierarchy

More than 70 different combinations of input data can be used to estimate an effect size. You can retrieve the effect size measures estimated by each combination of input data in the see_input_data() function and online https://metaconvert.org/input.html.

You have two options to use a hierarchy in the types of input data.

an automatic way (es_selected = "auto")
an manual way (es_selected = "hierarchy")

Automatic

If you select an automatic hierarchy, here are the types of input data that will be prioritized.

Crude SMD or MD (`measure=c("d", "g", "md")` and `selection_auto="crude"`)

User's input effect size value
SMD value
Means at post-test
ANOVA/Student's t-test/point biserial correlation statistics
Linear regression estimates
Mean difference values
Quartiles/median/maximum values
Post-test means extracted from a plot
Pre-test+post-test means or mean change
Paired ANOVA/t-test statistics
Odds ratio value
Contingency table
Correlation coefficients
Phi/chi-square value

Paired SMD or MD (`measure=c("d", "g", "md")` and `selection_auto="paired"`)

User's input effect size value
Paired SMD value
Pre-test+post-test means or mean change
Paired ANOVA/t-test statistics
Means at post-test
ANOVA/Student's t-test/point biserial correlation
Linear regression estimates
Mean difference values
Quartiles/median/maximum values
Odds ratio value
Contingency table
Correlation coefficients
Phi/chi-square value

Adjusted SMD or MD (`measure=c("d", "g", "md")` and `selection_auto="adjusted"`)

User's input adjusted effect size value
Adjusted SMD value
Estimated marginal means from ANCOVA
F- or t-test value from ANCOVA
Adjusted mean difference from ANCOVA
Estimated marginal means from ANCOVA extracted from a plot

Odds Ratio (`measure=c("or")`)

User's input effect size value
Odds ratio value
Contingency table
Risk ratio values
Phi/chi-square value
Correlation coefficients
(Then hierarchy as for "d" or "g" option crude)

Risk Ratio (`measure=c("rr")`)

User's input effect size value
Risk ratio values
Contingency table
Odds ratio values
Phi/chi-square value

Incidence rate ratio (`measure=c("irr")`)

User's input effect size value
Number of cases and time of disease free observation time

Correlation (`measure=c("r", "z")`)

User's input effect size value
Correlation coefficients
Contingency table
Odds ratio value
Phi/chi-square value
SMD value
Means at post-test
ANOVA/Student's t-test/point biserial correlation
Linear regression estimates
Mean difference values 11 Quartiles/median/maximum values
Post-test means extracted from a plot
Pre-test+post-test means or mean change
Paired ANOVA/t-test

Variability ratios (`measure=c("vr", "cvr")`)

User's input effect size value
means/variability indices at post-test
means/variability indices at post-test extracted from a plot

Number needed to treat (`measure=c("nnt")`)

User's input effect size value
Contingency table
Odds ratio values
Risk ratio values
Phi/chi-square value

Manual

If you select a manual hierarchy, you can specify the order in which you want to use each type of input data. You can prioritize some types of input data by placing them at the begining of the hierarchy argument, and you must separate all input data with a ">" separator. For example, if you set:

hierarchy = "means_sd \> means_se \> student_t", the convert_df function will prioritize the means + SD, then the means + SE, then the Student's t-test to estimate the main effect size.
hierarchy = "2x2 \> or_se \> phi", the convert_df function will prioritize the contigency table, then the odds ratio value + SE, then the phi coefficient to estimate the main effect size.

Importantly, if none of the types of input data indicated in the hierarchy argument can be used to estimate the target effect size measure, the convert_df() function will automatically try to use other types of input data to estimate an effect size.

Adjusted effect sizes

Some datasets will be composed of crude (i.e., non-adjusted) types of input data (such as standard means + SD, Student's t-test, etc.) and adjusted types of input data (such as means + SE from an ANCOVA model, a t-test from an ANCOVA, etc.).

In these situations, you can decide to:

treat crude and adjusted input data the same way split_adjusted = FALSE
split calculations for crude and adjusted types of input data split_adjusted = TRUE

If you want to split the calculations, you can decide to present the final dataset:

in a long format (i.e., crude and adjusted effect sizes presented in separate rows format_adjusted = "long")
in a wide format (i.e., crude and adjusted effect sizes presented in separate columns format_adjusted = "wide")

Examples


res <- convert_df(df.haza,
  measure = "g",
  split_adjusted = TRUE,
  es_selected = "minimum",
  format_adjusted = "long"
)
summary(res)

metaConvert package Read PDF manual

Maintainer: Corentin J. Gosling
License: GPL (>= 3)
Last published: 2025-04-11

Useful links

convert_df function