extractor() R function from [rsahmi]

Extract reads and output from Kraken


extract_taxids(
  kraken_report,
  taxon = c("d__Bacteria", "d__Fungi", "d__Viruses")
)

extract_kraken_output(
  kraken_out,
  taxids,
  odir,
  ofile = "kraken_microbiome_output.txt",
  ...
)

extract_kraken_reads(
  kraken_out,
  reads,
  ofile = NULL,
  odir = getwd(),
  threads = NULL,
  ...,
  envpath = NULL,
  seqkit = NULL
)

Arguments

kraken_report: The path to kraken report file.
taxon: An atomic character specify the taxa name wanted. Should follow the kraken style, connected by rank codes, two underscores, and the scientific name of the taxon (e.g., "d__Viruses")
kraken_out: The path to kraken output file.
taxids: A character specify NCBI taxonony identifier to extract.
odir: A string of directory to save the ofile.
ofile: A string of file save the kraken output of specified taxids.
...: * extract_kraken_output: Additional arguments passed to sink_csv().
- extract_kraken_reads: Additional arguments passed to cmd_run() method.
reads: The original fastq files (input in kraken2). You can pass two paired-end files directly.
threads: Number of threads to use, see blit::cmd_help(blit::seqkit("grep")).
envpath: A string of path to be added to the environment variable PATH.
seqkit: A string of path to seqkit command.

Returns

extract_taxids: An atomic character vector of taxon identifiers.
extract_kraken_output: A polars DataFrame .
extract_kraken_reads: Exit status invisiblely.

Examples


## Not run:

# For 10x Genomic data, `fq1` only contain barcode and umi, but the official
# didn't give any information for this. In this way, I prefer using
# `umi-tools` to transform the `umi` into fq2 and then run `rsahmi` with
# only fq2.
blit::kraken2(
    fq1 = fq1,
    fq2 = fq2,
    classified_out = "classified.fq",
    # Number of threads to use
    blit::arg("--threads", 10L, format = "%d"),
    # the kraken database
    blit::arg("--db", kraken_db),
    "--use-names", "--report-minimizer-data",
) |> blit::cmd_run()

# `kraken_report` should be the output of `blit::kraken2()`
taxids <- extract_taxids(kraken_report = "kraken_report.txt")

# 1. `kraken_out` should be the output of `blit::kraken2()`
# 2. `taxids` should be the output of `extract_taxids()`
# 3. `odir`: the output directory
extract_kraken_output(
    kraken_out = "kraken_output.txt",
    taxids = taxids,
    odir = # specify the output directory
)

# 1. `kraken_out` should be the output of `extract_kraken_output()`
# 2. `fq1` and `fq2` should be the same with `blit::kraken2()`
extract_kraken_reads(
    kraken_out = "kraken_microbiome_output.txt",
    reads = c(fq1, fq2),
    threads = 10L, # Number of threads to use
    # try to change `seqkit` argument into your seqkit path. If `NULL`, the
    # internal will detect it in your `PATH` environment variable
    seqkit = NULL
)
## End(Not run)

extractor function

Extract reads and output from Kraken

Arguments

Returns

Examples

See Also