taxdir: character, directory where the taxonomy files are kept.
id: numeric, taxonomic ID(s) of the nodes of interest.
nodes: dataframe, output from getnodes (optional).
rank: character, name of the taxonomic rank of interest.
names: dataframe, output from getnames (optional).
Details
These functions provide a convenient way to read data from NCBI taxonomy files (i.e., the contents of taxdump.tar.gz, which is available from https://ftp.ncbi.nih.gov/pub/taxonomy/).
The taxdir argument is used to specify the directory where the nodes.dmp and names.dmp files are located. getnodes and getnames read these files into data frames. getrank returns the rank (species, genus, etc) of the node with the given taxonomic id. parent returns the taxonomic ID of the next-lowest node below that specified by the id in the argument, unless rank is supplied, in which case the function descends the tree until a node with that rank is found. allparents returns all the taxonomic IDs of all nodes between that specified by id and the root of the tree, inclusive. sciname returns the scientific name of the node with the given id.
The id argument can be of length greater than 1 except for allparents. If getrank, parent, allparents or sciname need to be called repeatedly, the operation can be hastened by supplying the output of getnodes in the nodes argument and/or the output of getnames in the names argument.
Examples
## Get information about Homo sapiens from the## packaged taxonomy filestaxdir <- system.file("extdata/taxonomy", package ="CHNOSZ")# H. sapiens' taxonomic idid1 <-9606# That is a speciesgetrank(id1, taxdir)# The next step up the taxonomyid2 <- parent(id1, taxdir)print(id2)# That is a genusgetrank(id2, taxdir)# That genus is "Homo"sciname(id2, taxdir)# We can ask what phylum is it part of?id3 <- parent(id1, taxdir,"phylum")# Answer: "Chordata"sciname(id3, taxdir)# H. sapiens' complete taxonomyid4 <- allparents(id1, taxdir)sciname(id4, taxdir)## The names of the organisms in the supplied taxonomy filestaxdir <- system.file("extdata/taxonomy", package ="CHNOSZ")id5 <- c(83333,4932,9606,186497,243232)sciname(id5, taxdir)# These are not all species, though# (those with "no rank" are something like strains, # e.g. Escherichia coli K-12)getrank(id5, taxdir)# Find the species for each of theseid6 <- sapply(id5,function(x) parent(x, taxdir = taxdir, rank ="species"))unique(getrank(id6, taxdir))# "species"# Note that the K-12 is droppedsciname(id6, taxdir)## The complete nodes.dmp and names.dmp files are quite large,## so it helps to store them in memory when performing multiple queries## (this doesn't have a noticeable speed-up for the small files in this example)taxdir <- system.file("extdata/taxonomy", package ="CHNOSZ")nodes <- getnodes(taxdir = taxdir)# All of the node ids in this fileid7 <- nodes$id
# All of the non-leaf nodesid8 <- unique(parent(id7, nodes = nodes))names <- getnames(taxdir = taxdir)sciname(id8, names = names)