True taxa are detected on multiple barcodes and with a proprotional number of total and unique k-mer sequences across barcodes, measured as a significant Spearman correlation between the number of total and unique k-mers across barcodes. (padj < 0.05)
method: A character string indicating which correlation coefficient is to be used for the test. One of "pearson", "kendall", or "spearman", can be abbreviated.
...: Other arguments passed to cor.test .
p.adjust: Pvalue correction method, a character string. Can be abbreviated. Details see p.adjust .
min_kmer_len: An integer, the minimal number of kmer to filter taxa. SAHMI use 2.
min_number: An integer, the minimal number of cell per taxid. SAHMI use 4.
Returns
A polars DataFrame
Examples
## Not run:# 1. `sahmi_datasets` should be the output of all samples from `prep_dataset()`
# 2. `real_taxids_slsd` should be the output of `slsd()`umi_list <- lapply(sahmi_datasets,function(dataset){(barcode k-mer correlation test) blsd <- blsd(dataset$kmer) real_taxids <- blsd$filter(pl$col("padj")$lt(0.05))$get_column("taxid")# only keep taxids pass Sample level signal denoising real_taxids <- real_taxids$filter(real_taxids$is_in(real_taxids_slsd))# remove contaminants real_taxids <- real_taxids$filter( real_taxids$is_in(attr(truly_microbe,"truly")))# filter UMI data dataset$umi$filter(pl$col("taxid")$is_in(real_taxids))})## End(Not run)