ref_and_mix_pipeline function

Estimate mixing proportions from reference and mixture datasets

Estimate mixing proportions from reference and mixture datasets

Takes a mixture and reference dataframe of two-column genetic data, and a desired method of estimation for the population mixture proportions (MCMC, PB, or BH MCMC) Returns the output of the chosen estimation method

ref_and_mix_pipeline( reference, mixture, gen_start_col, method = "MCMC", reps = 2000, burn_in = 100, sample_int_Pi = 0, sample_int_PofZ = 0, sample_int_omega = 0, sample_int_rho = 0, sample_int_PofR = 0 )

Arguments

  • reference: a dataframe of two-column genetic format data, proceeded by "repunit", "collection", and "indiv" columns. Does not need "sample_type" column, and will be overwritten if provided
  • mixture: a dataframe of two-column genetic format data. Must have the same structure as reference dataframe, but "collection" and "repunit" columns are ignored. Does not need "sample_type" column, and will be overwritten if provided
  • gen_start_col: the first column of genetic data in both data frames
  • method: this must be "MCMC". "PB" and "BH" are no longer supported in this function.
  • reps: the number of iterations to be performed in MCMC
  • burn_in: how many reps to discard in the beginning of MCMC when doing the mean calculation. They will still be returned in the traces if desired.
  • sample_int_Pi: the number of reps between samples being taken for pi traces. If 0 no traces are taken. Only used in methods "MCMC" and "PB".
  • sample_int_PofZ: the number of reps between samples being taken for the posterior traces of each individual's collection of origin. If 0 no trace samples are taken. Used in all methods
  • sample_int_omega: the number of reps between samples being taken for collection proportion traces. If 0 no traces are taken. Only used in method "BH"
  • sample_int_rho: the number of reps between samples being taken for reporting unit proportion traces. If 0 no traces are taken. Only used in method "BH"
  • sample_int_PofR: the number of reps between samples being taken for the posterior traces of each individual's reporting unit of origin. If 0 no trace samples are taken. Only used in method "BH".

Returns

mix_proportion_pipeline returns the standard output of the chosen mixing proportion estimation method (always a list). For method "PB", returns the standard MCMC results, as well as the bootstrap-corrected collection proportions under $mean$bootstrap

Details

"MCMC" estimates mixing proportions and individual posterior probabilities of assignment through Markov-chain Monte Carlo, while "PB" does the same with a parametric bootstrapping correction, and "BH" uses the misassignment-scaled, hierarchical MCMC. All methods use a uniform 1/(# collections or RUs) prior for pi/omega and rho.

Examples

reference <- small_chinook_ref mixture <- small_chinook_mix gen_start_col <- 5 # this function expects things as factors. This function is old and needs # to be replaced and deprecated. reference$repunit <- factor(reference$repunit, levels = unique(reference$repunit)) reference$collection <- factor(reference$collection, levels = unique(reference$collection)) mixture$repunit <- factor(mixture$repunit, levels = unique(mixture$repunit)) mixture$collection <- factor(mixture$collection, levels = unique(mixture$collection)) mcmc <- ref_and_mix_pipeline(reference, mixture, gen_start_col, method = "MCMC")
  • Maintainer: Eric C. Anderson
  • License: CC0
  • Last published: 2024-01-24

Useful links