taxon: An atomic character specify the taxa name wanted. Should follow the kraken style, connected by rank codes, two underscores, and the scientific name of the taxon (e.g., "d__Viruses")
kraken_out: The path to kraken output file.
taxids: A character specify NCBI taxonony identifier to extract.
odir: A string of directory to save the ofile.
ofile: A string of file save the kraken output of specified taxids.
...: * extract_kraken_output: Additional arguments passed to sink_csv().
extract_kraken_reads: Additional arguments passed to cmd_run() method.
reads: The original fastq files (input in kraken2). You can pass two paired-end files directly.
threads: Number of threads to use, see blit::cmd_help(blit::seqkit("grep")).
envpath: A string of path to be added to the environment variable PATH.
seqkit: A string of path to seqkit command.
Returns
extract_taxids: An atomic character vector of taxon identifiers.
extract_kraken_output: A polars DataFrame .
extract_kraken_reads: Exit status invisiblely.
Examples
## Not run:# For 10x Genomic data, `fq1` only contain barcode and umi, but the official# didn't give any information for this. In this way, I prefer using# `umi-tools` to transform the `umi` into fq2 and then run `rsahmi` with# only fq2.blit::kraken2( fq1 = fq1, fq2 = fq2, classified_out ="classified.fq",# Number of threads to use blit::arg("--threads",10L, format ="%d"),# the kraken database blit::arg("--db", kraken_db),"--use-names","--report-minimizer-data",)|> blit::cmd_run()# `kraken_report` should be the output of `blit::kraken2()`taxids <- extract_taxids(kraken_report ="kraken_report.txt")# 1. `kraken_out` should be the output of `blit::kraken2()`# 2. `taxids` should be the output of `extract_taxids()`# 3. `odir`: the output directoryextract_kraken_output( kraken_out ="kraken_output.txt", taxids = taxids, odir =# specify the output directory)# 1. `kraken_out` should be the output of `extract_kraken_output()`# 2. `fq1` and `fq2` should be the same with `blit::kraken2()`extract_kraken_reads( kraken_out ="kraken_microbiome_output.txt", reads = c(fq1, fq2), threads =10L,# Number of threads to use# try to change `seqkit` argument into your seqkit path. If `NULL`, the# internal will detect it in your `PATH` environment variable seqkit =NULL)## End(Not run)