XML2Obs function

Parse XML files into a list of "observations"

Parse XML files into a list of "observations"

This function takes a collection of urls that point to XML files and coerces the relevant info into a list of observations. An "observation" is defined as a matrix with one row. An observation can also be thought of as a single instance of XML attributes (and value) for a particular level in the XML hierarchy. The names of the list reflect the XML node ancestory for which each observation was extracted from.

XML2Obs( urls, xpath, append.value = TRUE, as.equiv = TRUE, url.map = FALSE, local = FALSE, quiet = FALSE, ... )

Arguments

  • urls: character vector. Either urls that point to an XML file online or a local XML file name.
  • xpath: XML XPath expression that is passed to getNodeSet . If missing, the entire root and all descendents are captured and returned (ie, tables = "/").
  • append.value: logical. Should the XML value be appended for relevant observations?
  • as.equiv: logical. Should observations from two different files (but the same ancestory) have the same name returned?
  • url.map: logical. If TRUE, the 'url_key' column will contain a condensed url identifier (for each observation) and full urls will be stored in the "url_map" element. If FALSE, the full urls are included (for each observation) as a 'url' column and no "url_map" is included.
  • local: logical. Should urls be treated as paths to local files?
  • quiet: logical. Print file name currently being parsed?
  • ...: arguments passed along to httr::GET

Returns

A list of "observations" and (possibly) the "url_map" element.

Details

It's worth noting that a "url_key" column is appended to each observation to help track the origin of each observation. The value of the "url_key" column is not the actual file name, but a simplified identifier to avoid unnecessarily repeating long file names for each observation. For this reason, an addition element (named "url_map") is added to the list of observations in case the actual file named want to be used.

Examples

## Not run: urls <- c("http://gd2.mlb.com/components/game/mlb/year_2013/mobile/346180.xml", "http://gd2.mlb.com/components/game/mlb/year_2013/mobile/346188.xml") obs <- XML2Obs(urls) table(names(obs)) # parses local files as well players <- system.file("extdata", "players.xml", package = "XML2R") obs2 <- XML2Obs(players, local = TRUE) table(names(obs2)) ## End(Not run)

See Also

urlsToDocs , docsToNodes , nodesToList , listsToObs

  • Maintainer: Carson Sievert
  • License: GPL (>= 2)
  • Last published: 2024-06-04