tcf2param_list function

Generate MCMC parameter list from two-column genetic data & print summary

Generate MCMC parameter list from two-column genetic data & print summary

This function is a wrapper for all steps to create the parameter list necessary for genotype log-likelihood calculation from the starting two-column genetic data

tcf2param_list( D, gen_start_col, samp_type = "both", alle_freq_prior = list(const_scaled = 1), summ = T, ploidies )

Arguments

  • D: A data frame containing two-column genetic data, preceded by metadata. The header of the first genetic data column in each pair lists the locus name, the second is ignored. Locus names must not have spaces in them!

    Required metadata includes a column of unique individual identifiers named "indiv", a column named "collection" designating the sample groups, a column "repunit" designating the reporting unit of origin of each fish, and a "sample_type" column denoting each individual as a "reference" or "mixture" sample. No NAs should be present in metadata

  • gen_start_col: The index (number) of the column in which genetic data starts. Columns must be only genetic data after genetic data starts.

  • samp_type: the sample groups to be include in the individual genotype list, whose likelihoods will be used in MCMC. Options "reference", "mixture", and "both"

  • alle_freq_prior: a one-element named list specifying the prior to be used when generating Dirichlet parameters for genotype likelihood calculations. Valid methods include "const", "scaled_const", and "empirical". See ?list_diploid_params for method details.

  • summ: logical indicating whether summary descriptions of the formatted data be provided

  • ploidies: a named vector of ploidies (1 or 2) for each locus. The names must the the locus names.

Returns

tcf2param_list returns the output of list_diploid_params, after the original dataset is converted to a usable format and all relevant values are extracted. See ?list_diploid_params for details

Details

In order for all steps in conversion to be carried out successfully, the dataset must have "repunit", "collection", "indiv", and "sample_type" columns preceding two-column genetic data. If summ == TRUE, the function prints summary statistics describing the structure of the dataset, as well as the presence of missing data, enabling verification of proper data conversion.

Examples

# after adding support for haploid markers we need to pass # in the ploidies vector. These markers are all diploid... locnames <- names(alewife)[-(1:16)][c(TRUE, FALSE)] ploidies <- rep(2, length(locnames)) names(ploidies) <- locnames ale_par_list <- tcf2param_list(alewife, 17, ploidies = ploidies)
  • Maintainer: Eric C. Anderson
  • License: CC0
  • Last published: 2024-01-24

Useful links