In the low-microbiome biomass setting, real microbes also exhibit a proportional number of total k-mers, number of unique k-mers, as well as number of total assigned sequencing reads across samples; i.e. the following three Spearman correlations are significant when tested using sample-level data provided in Kraken reports: cor(minimizer_len, minimizer_n_unique), cor(minimizer_len, total_reads) and cor(total_reads, minimizer_n_unique). (r1>0 & r2>0 & r3>0 & p1<0.05 & p2<0.05 & p3<0.05).
kreports: kreports data returned by prep_dataset() for all samples.
method: A character string indicating which correlation coefficient is to be used for the test. One of "pearson", "kendall", or "spearman", can be abbreviated.
...: Other arguments passed to cor.test .
min_reads: An integer, the minimal number of the total reads to filter taxa. SAHMI use 2.
min_minimizer_n_unique: An integer, the minimal number of the unique number of minimizer to filter taxa. SAHMI use 2.
min_number: An integer, the minimal number of samples per taxid. SAHMI use 4.
Returns
A polars DataFrame of correlation coefficient and pvalue for cor(minimizer_len, minimizer_n_unique) (r1 and p1), cor(minimizer_len, total_reads) (r2 and p2) and cor(total_reads, minimizer_n_unique) (r3 and p3).
Examples
## Not run:# `sahmi_datasets` should be the output of all samples from `prep_dataset()`slsd <- slsd(lapply(sahmi_datasets, `[[`,"kreport"))real_taxids_slsd <- slsd$filter( pl$col("r1")$gt(0), pl$col("r2")$gt(0), pl$col("r3")$gt(0), pl$col("p1")$lt(0.05), pl$col("p2")$lt(0.05), pl$col("p3")$lt(0.05))$get_column("taxid")## End(Not run)