Identifying contaminants and false positives taxa (cell line quantile test)
Identifying contaminants and false positives taxa (cell line quantile test)
remove_contaminants( kraken_reports, study ="current study", taxon = c("d__Bacteria","d__Fungi","d__Viruses"), quantile =0.95, alpha =0.05, alternative ="greater", exclusive =FALSE)
Arguments
kraken_reports: A character of path to all kraken report files.
study: A string of the study name, used to differentiate with cell line data.
taxon: An atomic character specify the taxa name wanted. Should follow the kraken style, connected by rank codes, two underscores, and the scientific name of the taxon (e.g., "d__Viruses")
quantile: Probabilities with values in [0, 1] specifying the quantile to calculate.
alpha: Level of significance.
alternative: A string specifying the alternative hypothesis, must be one of "two.sided", "greater" (default) or "less". You can specify just the initial letter.
exclusive: A boolean value, indicates whether taxa not found in celllines data should be regarded as truly. Default: FALSE.
Returns
A polars DataFrame with following attributes:
pvalues: Quantile test pvalue.
exclusive: taxids in current study but not found in cellline data.
significant: significant taxids with pvalues \< alpha.
truly: truly taxids based on alpha and exclusive. If exclusive is TRUE, this should be the union of exclusive and significant, otherwise, this should be the same with significant.