extractor function

Extract reads and output from Kraken

Extract reads and output from Kraken

extract_taxids( kraken_report, taxon = c("d__Bacteria", "d__Fungi", "d__Viruses") ) extract_kraken_output( kraken_out, taxids, odir, ofile = "kraken_microbiome_output.txt", ... ) extract_kraken_reads( kraken_out, reads, ofile = NULL, odir = getwd(), threads = NULL, ..., envpath = NULL, seqkit = NULL )

Arguments

  • kraken_report: The path to kraken report file.
  • taxon: An atomic character specify the taxa name wanted. Should follow the kraken style, connected by rank codes, two underscores, and the scientific name of the taxon (e.g., "d__Viruses")
  • kraken_out: The path to kraken output file.
  • taxids: A character specify NCBI taxonony identifier to extract.
  • odir: A string of directory to save the ofile.
  • ofile: A string of file save the kraken output of specified taxids.
  • ...: * extract_kraken_output: Additional arguments passed to sink_csv().
    • extract_kraken_reads: Additional arguments passed to cmd_run() method.
  • reads: The original fastq files (input in kraken2). You can pass two paired-end files directly.
  • threads: Number of threads to use, see blit::cmd_help(blit::seqkit("grep")).
  • envpath: A string of path to be added to the environment variable PATH.
  • seqkit: A string of path to seqkit command.

Returns

  • extract_taxids: An atomic character vector of taxon identifiers.

  • extract_kraken_output: A polars DataFrame .

  • extract_kraken_reads: Exit status invisiblely.

Examples

## Not run: # For 10x Genomic data, `fq1` only contain barcode and umi, but the official # didn't give any information for this. In this way, I prefer using # `umi-tools` to transform the `umi` into fq2 and then run `rsahmi` with # only fq2. blit::kraken2( fq1 = fq1, fq2 = fq2, classified_out = "classified.fq", # Number of threads to use blit::arg("--threads", 10L, format = "%d"), # the kraken database blit::arg("--db", kraken_db), "--use-names", "--report-minimizer-data", ) |> blit::cmd_run() # `kraken_report` should be the output of `blit::kraken2()` taxids <- extract_taxids(kraken_report = "kraken_report.txt") # 1. `kraken_out` should be the output of `blit::kraken2()` # 2. `taxids` should be the output of `extract_taxids()` # 3. `odir`: the output directory extract_kraken_output( kraken_out = "kraken_output.txt", taxids = taxids, odir = # specify the output directory ) # 1. `kraken_out` should be the output of `extract_kraken_output()` # 2. `fq1` and `fq2` should be the same with `blit::kraken2()` extract_kraken_reads( kraken_out = "kraken_microbiome_output.txt", reads = c(fq1, fq2), threads = 10L, # Number of threads to use # try to change `seqkit` argument into your seqkit path. If `NULL`, the # internal will detect it in your `PATH` environment variable seqkit = NULL ) ## End(Not run)

See Also

https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown

  • Maintainer: Yun Peng
  • License: MIT + file LICENSE
  • Last published: 2025-03-24