Estimate mixing proportions from reference and mixture datasets
Estimate mixing proportions from reference and mixture datasets
Takes a mixture and reference dataframe of two-column genetic data, and a desired method of estimation for the population mixture proportions (MCMC, PB, or BH MCMC) Returns the output of the chosen estimation method
reference: a dataframe of two-column genetic format data, proceeded by "repunit", "collection", and "indiv" columns. Does not need "sample_type" column, and will be overwritten if provided
mixture: a dataframe of two-column genetic format data. Must have the same structure as reference dataframe, but "collection" and "repunit" columns are ignored. Does not need "sample_type" column, and will be overwritten if provided
gen_start_col: the first column of genetic data in both data frames
method: this must be "MCMC". "PB" and "BH" are no longer supported in this function.
reps: the number of iterations to be performed in MCMC
burn_in: how many reps to discard in the beginning of MCMC when doing the mean calculation. They will still be returned in the traces if desired.
sample_int_Pi: the number of reps between samples being taken for pi traces. If 0 no traces are taken. Only used in methods "MCMC" and "PB".
sample_int_PofZ: the number of reps between samples being taken for the posterior traces of each individual's collection of origin. If 0 no trace samples are taken. Used in all methods
sample_int_omega: the number of reps between samples being taken for collection proportion traces. If 0 no traces are taken. Only used in method "BH"
sample_int_rho: the number of reps between samples being taken for reporting unit proportion traces. If 0 no traces are taken. Only used in method "BH"
sample_int_PofR: the number of reps between samples being taken for the posterior traces of each individual's reporting unit of origin. If 0 no trace samples are taken. Only used in method "BH".
Returns
mix_proportion_pipeline returns the standard output of the chosen mixing proportion estimation method (always a list). For method "PB", returns the standard MCMC results, as well as the bootstrap-corrected collection proportions under $mean$bootstrap
Details
"MCMC" estimates mixing proportions and individual posterior probabilities of assignment through Markov-chain Monte Carlo, while "PB" does the same with a parametric bootstrapping correction, and "BH" uses the misassignment-scaled, hierarchical MCMC. All methods use a uniform 1/(# collections or RUs) prior for pi/omega and rho.
Examples
reference <- small_chinook_ref
mixture <- small_chinook_mix
gen_start_col <-5# this function expects things as factors. This function is old and needs# to be replaced and deprecated.reference$repunit <- factor(reference$repunit, levels = unique(reference$repunit))reference$collection <- factor(reference$collection, levels = unique(reference$collection))mixture$repunit <- factor(mixture$repunit, levels = unique(mixture$repunit))mixture$collection <- factor(mixture$collection, levels = unique(mixture$collection))mcmc <- ref_and_mix_pipeline(reference, mixture, gen_start_col, method ="MCMC")