convert_df function

Automatically compute effect sizes from a well formatted dataset

Automatically compute effect sizes from a well formatted dataset

convert_df( x, measure = c("d", "g", "md", "logor", "logrr", "logirr", "nnt", "r", "z", "logvr", "logcvr"), main_es = TRUE, es_selected = c("auto", "hierarchy", "minimum", "maximum"), selection_auto = c("crude", "paired", "adjusted"), split_adjusted = TRUE, format_adjusted = c("wide", "long"), verbose = TRUE, max_asymmetry = 10, hierarchy = "means_sd > means_se > means_ci", table_2x2_to_cor = "tetrachoric", rr_to_or = "metaumbrella", or_to_rr = "metaumbrella_cases", or_to_cor = "bonett", smd_to_cor = "viechtbauer", pre_post_to_smd = "bonett", r_pre_post = 0.5, cor_to_smd = "viechtbauer", unit_type = "raw_scale", yates_chisq = FALSE )

Arguments

  • x: a well formatted dataset
  • measure: the effect size measure that will be estimated from the information stored in the dataset. See details.
  • main_es: a logical variable indicating whether a main effect size should be selected when overlapping data are present. See details.
  • es_selected: the method used to select the main effect size when several information allows to estimate an effect size for the same association/comparison. Must be either "minimum" (the smallest effect size will be selected), "maximum" (the largest effect size will be selected) or "hierarchy" (the effect size computed from the information specified highest in the hierarchy will be selected). See details.
  • selection_auto: a character string giving details on the best "auto" hierarchy to use (only useful when hierarchy="auto" and measure= "d", "g" or "md"). See details.
  • split_adjusted: a logical value indicating whether crude and adjusted effect sizes should be presented separately. See details.
  • format_adjusted: presentation format of the adjusted effect sizes. See details.
  • verbose: a logical variable indicating whether text outputs and messages should be generated. We recommend turning this option to FALSE only after having carefully read all the generated messages.
  • max_asymmetry: A percentage indicating the tolerance before detecting asymmetry in the 95% CI bounds.
  • hierarchy: a character string indicating the hierarchy in the information to be prioritized for the effect size calculations. See details.
  • table_2x2_to_cor: formula used to obtain a correlation coefficient from the contingency table. For now only 'tetrachoric' is available.
  • rr_to_or: formula used to convert the rr value into an odds ratio.
  • or_to_rr: formula used to convert the or value into a risk ratio.
  • or_to_cor: formula used to convert the or value into a correlation coefficient.
  • smd_to_cor: formula used to convert the cohen_d value into a coefficient correlation.
  • pre_post_to_smd: formula used to obtain a SMD from pre/post means and SD of two independent groups.
  • r_pre_post: pre-post correlation across the two groups (use this argument only if the precise correlation in each group is unknown)
  • cor_to_smd: formula used to convert a correlation coefficient value into a SMD.
  • unit_type: the type of unit for the unit_increase_iv argument. Must be either "sd" or "value" (see es_from_pearson_r).
  • yates_chisq: a logical value indicating whether the Chi square has been performed using Yate's correction for continuity.

Returns

The convert_df() function returns a list of more than 70 dataframes (one for each function automatically applied to the dataset). These dataframes systematically contain the columns described in metaConvert-package. The list of dataframes can be easily converted to a single, calculations-ready dataframe using the summary function (see summary.metaConvert).

Details

This function automatically computes or converts between 11 effect sizes measures from any relevant type of input data stored in the dataset you pass to this function.

Effect size measures

Possible effect size measures are:

  1. Cohen's d ("d")
  2. Hedges' g ("g")
  3. mean difference ("md")
  4. (log) odds ratio ("or" and "logor")
  5. (log) risk ratio ("rr" and "logrr")
  6. (log) incidence rate ratio ("irr" and "logirr")
  7. correlation coefficient ("r")
  8. transformed r-to-z correlation coefficient ("z")
  9. log variability ratio ("logvr")
  10. log coefficient of variation ("logcvr")
  11. number needed to treat ("nnt")

Computation of a main effect size

If you enter multiple types of input data (e.g., means/sd of two groups and a student t-test value) for the same comparison i.e., for the same row of the dataset, the convert_df() function can have two behaviours. If you set:

  • main_es = FALSE the function will estimate all possible effect sizes from all types of input data (which implies that if a comparison has several types of input data , it will result in multiple rows in the dataframe returned by the function)
  • main_es = TRUE the function will select one effect size per comparison (which implies that if a comparison has several types of input data , it will result in a unique row in the dataframe returned by the function)

Selection of input data for the computation of the main effect size

If you choose to estimate one main effect size (i.e., by setting main_es = TRUE), you have several options to select this main effect size. If you set:

  • es_selected = "auto": the main effect size will be automatically selected, by prioritizing specific types of input data over other (see next section "Hierarchy").
  • es_selected = "hierarchy": the main effect size will be selected, by prioritizing specific types of input data over other (see next section "Hierarchy").
  • es_selected = "minimum": the main effect size will be selected, by selecting the lowest effect size available.
  • es_selected = "maximum": the main effect size will be selected, by selecting the highest effect size available.

Hierarchy

More than 70 different combinations of input data can be used to estimate an effect size. You can retrieve the effect size measures estimated by each combination of input data in the see_input_data() function and online https://metaconvert.org/input.html.

You have two options to use a hierarchy in the types of input data.

  • an automatic way (es_selected = "auto")
  • an manual way (es_selected = "hierarchy")

Automatic

If you select an automatic hierarchy, here are the types of input data that will be prioritized.

Crude SMD or MD (measure=c("d", "g", "md") and selection_auto="crude")
  1. User's input effect size value
  2. SMD value
  3. Means at post-test
  4. ANOVA/Student's t-test/point biserial correlation statistics
  5. Linear regression estimates
  6. Mean difference values
  7. Quartiles/median/maximum values
  8. Post-test means extracted from a plot
  9. Pre-test+post-test means or mean change
  10. Paired ANOVA/t-test statistics
  11. Odds ratio value
  12. Contingency table
  13. Correlation coefficients
  14. Phi/chi-square value
Paired SMD or MD (measure=c("d", "g", "md") and selection_auto="paired")
  1. User's input effect size value
  2. Paired SMD value
  3. Pre-test+post-test means or mean change
  4. Paired ANOVA/t-test statistics
  5. Means at post-test
  6. ANOVA/Student's t-test/point biserial correlation
  7. Linear regression estimates
  8. Mean difference values
  9. Quartiles/median/maximum values
  10. Odds ratio value
  11. Contingency table
  12. Correlation coefficients
  13. Phi/chi-square value
Adjusted SMD or MD (measure=c("d", "g", "md") and selection_auto="adjusted")
  1. User's input adjusted effect size value
  2. Adjusted SMD value
  3. Estimated marginal means from ANCOVA
  4. F- or t-test value from ANCOVA
  5. Adjusted mean difference from ANCOVA
  6. Estimated marginal means from ANCOVA extracted from a plot
Odds Ratio (measure=c("or"))
  1. User's input effect size value
  2. Odds ratio value
  3. Contingency table
  4. Risk ratio values
  5. Phi/chi-square value
  6. Correlation coefficients
  7. (Then hierarchy as for "d" or "g" option crude)
Risk Ratio (measure=c("rr"))
  1. User's input effect size value
  2. Risk ratio values
  3. Contingency table
  4. Odds ratio values
  5. Phi/chi-square value
Incidence rate ratio (measure=c("irr"))
  1. User's input effect size value
  2. Number of cases and time of disease free observation time
Correlation (measure=c("r", "z"))
  1. User's input effect size value
  2. Correlation coefficients
  3. Contingency table
  4. Odds ratio value
  5. Phi/chi-square value
  6. SMD value
  7. Means at post-test
  8. ANOVA/Student's t-test/point biserial correlation
  9. Linear regression estimates
  10. Mean difference values 11 Quartiles/median/maximum values
  11. Post-test means extracted from a plot
  12. Pre-test+post-test means or mean change
  13. Paired ANOVA/t-test
Variability ratios (measure=c("vr", "cvr"))
  1. User's input effect size value
  2. means/variability indices at post-test
  3. means/variability indices at post-test extracted from a plot
Number needed to treat (measure=c("nnt"))
  1. User's input effect size value
  2. Contingency table
  3. Odds ratio values
  4. Risk ratio values
  5. Phi/chi-square value

Manual

If you select a manual hierarchy, you can specify the order in which you want to use each type of input data. You can prioritize some types of input data by placing them at the begining of the hierarchy argument, and you must separate all input data with a ">" separator. For example, if you set:

  • hierarchy = "means_sd \> means_se \> student_t", the convert_df function will prioritize the means + SD, then the means + SE, then the Student's t-test to estimate the main effect size.
  • hierarchy = "2x2 \> or_se \> phi", the convert_df function will prioritize the contigency table, then the odds ratio value + SE, then the phi coefficient to estimate the main effect size.

Importantly, if none of the types of input data indicated in the hierarchy argument can be used to estimate the target effect size measure, the convert_df() function will automatically try to use other types of input data to estimate an effect size.

Adjusted effect sizes

Some datasets will be composed of crude (i.e., non-adjusted) types of input data (such as standard means + SD, Student's t-test, etc.) and adjusted types of input data (such as means + SE from an ANCOVA model, a t-test from an ANCOVA, etc.).

In these situations, you can decide to:

  • treat crude and adjusted input data the same way split_adjusted = FALSE
  • split calculations for crude and adjusted types of input data split_adjusted = TRUE

If you want to split the calculations, you can decide to present the final dataset:

  • in a long format (i.e., crude and adjusted effect sizes presented in separate rows format_adjusted = "long")
  • in a wide format (i.e., crude and adjusted effect sizes presented in separate columns format_adjusted = "wide")

Examples

res <- convert_df(df.haza, measure = "g", split_adjusted = TRUE, es_selected = "minimum", format_adjusted = "long" ) summary(res)
  • Maintainer: Corentin J. Gosling
  • License: GPL (>= 3)
  • Last published: 2025-04-11

Useful links