slsd() R function from [rsahmi]

Sample level signal denoising

In the low-microbiome biomass setting, real microbes also exhibit a proportional number of total k-mers, number of unique k-mers, as well as number of total assigned sequencing reads across samples; i.e. the following three Spearman correlations are significant when tested using sample-level data provided in Kraken reports: cor(minimizer_len, minimizer_n_unique), cor(minimizer_len, total_reads) and cor(total_reads, minimizer_n_unique). (r1>0 & r2>0 & r3>0 & p1<0.05 & p2<0.05 & p3<0.05).


slsd(
  kreports,
  method = "spearman",
  ...,
  min_reads = 3L,
  min_minimizer_n_unique = 3L,
  min_number = 3L
)

Arguments

kreports: kreports data returned by prep_dataset() for all samples.
method: A character string indicating which correlation coefficient is to be used for the test. One of "pearson", "kendall", or "spearman", can be abbreviated.
...: Other arguments passed to cor.test .
min_reads: An integer, the minimal number of the total reads to filter taxa. SAHMI use 2.
min_minimizer_n_unique: An integer, the minimal number of the unique number of minimizer to filter taxa. SAHMI use 2.
min_number: An integer, the minimal number of samples per taxid. SAHMI use 4.

Returns

A polars DataFrame of correlation coefficient and pvalue for cor(minimizer_len, minimizer_n_unique) (r1 and p1), cor(minimizer_len, total_reads) (r2 and p2) and cor(total_reads, minimizer_n_unique) (r3 and p3).

Examples


## Not run:

# `sahmi_datasets` should be the output of all samples from `prep_dataset()`
slsd <- slsd(lapply(sahmi_datasets, `[[`, "kreport"))
real_taxids_slsd <- slsd$filter(
    pl$col("r1")$gt(0),
    pl$col("r2")$gt(0),
    pl$col("r3")$gt(0),
    pl$col("p1")$lt(0.05),
    pl$col("p2")$lt(0.05),
    pl$col("p3")$lt(0.05)
)$get_column("taxid")
## End(Not run)

rsahmi package Read PDF manual

Maintainer: Yun Peng
License: MIT + file LICENSE
Last published: 2025-03-24

Useful links

slsd function

Sample level signal denoising

Arguments

Returns

Examples