ref_and_mix_pipeline() R function from [rubias]

Estimate mixing proportions from reference and mixture datasets

Takes a mixture and reference dataframe of two-column genetic data, and a desired method of estimation for the population mixture proportions (MCMC, PB, or BH MCMC) Returns the output of the chosen estimation method


ref_and_mix_pipeline(
  reference,
  mixture,
  gen_start_col,
  method = "MCMC",
  reps = 2000,
  burn_in = 100,
  sample_int_Pi = 0,
  sample_int_PofZ = 0,
  sample_int_omega = 0,
  sample_int_rho = 0,
  sample_int_PofR = 0
)

Arguments

reference: a dataframe of two-column genetic format data, proceeded by "repunit", "collection", and "indiv" columns. Does not need "sample_type" column, and will be overwritten if provided
mixture: a dataframe of two-column genetic format data. Must have the same structure as reference dataframe, but "collection" and "repunit" columns are ignored. Does not need "sample_type" column, and will be overwritten if provided
gen_start_col: the first column of genetic data in both data frames
method: this must be "MCMC". "PB" and "BH" are no longer supported in this function.
reps: the number of iterations to be performed in MCMC
burn_in: how many reps to discard in the beginning of MCMC when doing the mean calculation. They will still be returned in the traces if desired.
sample_int_Pi: the number of reps between samples being taken for pi traces. If 0 no traces are taken. Only used in methods "MCMC" and "PB".
sample_int_PofZ: the number of reps between samples being taken for the posterior traces of each individual's collection of origin. If 0 no trace samples are taken. Used in all methods
sample_int_omega: the number of reps between samples being taken for collection proportion traces. If 0 no traces are taken. Only used in method "BH"
sample_int_rho: the number of reps between samples being taken for reporting unit proportion traces. If 0 no traces are taken. Only used in method "BH"
sample_int_PofR: the number of reps between samples being taken for the posterior traces of each individual's reporting unit of origin. If 0 no trace samples are taken. Only used in method "BH".

Returns

mix_proportion_pipeline returns the standard output of the chosen mixing proportion estimation method (always a list). For method "PB", returns the standard MCMC results, as well as the bootstrap-corrected collection proportions under $mean$bootstrap

Details

"MCMC" estimates mixing proportions and individual posterior probabilities of assignment through Markov-chain Monte Carlo, while "PB" does the same with a parametric bootstrapping correction, and "BH" uses the misassignment-scaled, hierarchical MCMC. All methods use a uniform 1/(# collections or RUs) prior for pi/omega and rho.

Examples


reference <- small_chinook_ref
mixture <- small_chinook_mix
gen_start_col <- 5

# this function expects things as factors.  This function is old and needs
# to be replaced and deprecated.

reference$repunit <- factor(reference$repunit, levels = unique(reference$repunit))
reference$collection <- factor(reference$collection, levels = unique(reference$collection))
mixture$repunit <- factor(mixture$repunit, levels = unique(mixture$repunit))
mixture$collection <- factor(mixture$collection, levels = unique(mixture$collection))

mcmc <- ref_and_mix_pipeline(reference, mixture, gen_start_col, method = "MCMC")

rubias package Read PDF manual

Maintainer: Eric C. Anderson
License: CC0
Last published: 2024-01-24

Useful links

ref_and_mix_pipeline function

Estimate mixing proportions from reference and mixture datasets

Arguments

Returns

Details

Examples