source_data function

Load plain-text data and RData from a URL (either http or https)

Load plain-text data and RData from a URL (either http or https)

source_data loads plain-text or RDATA formatted data stored at a URL (both http and https) into R.

Source

Originally based on source_url from the Hadley Wickham's devtools package.

source_data(url, rdata, sha1 = NULL, cache = FALSE, clearCache = FALSE, sep = "auto", header = "auto", stringsAsFactors = FALSE, envir = parent.frame(), ...)

Arguments

  • url: The data's URL. To distinguish between plain-text and RDATA the url must end in a distinguishing file extension.
  • rdata: logical. Whether or not the data set is an .RDATA file. If not specified than source_url will attempt to determine whether or not the file is an .RDATA file from the URL's extension.
  • sha1: Character string of the file's SHA-1 hash, generated by source_data. Note if you are using data stored using Git, this is not the file's commit SHA-1 hash.
  • cache: logical. Whether or not to cache the data so that it is not downloaded every time the function is called.
  • clearCache: logical. Whether or not to clear the downloaded data from the cache.
  • sep: The separator method for the plain-text data. For example, to load comma-separated values data (CSV) use sep = ",". To load tab-separated values data (TSV) use sep = "\t". Only relevant for plain-text data.
  • header: Logical, whether or not the first line of the file is the header (i.e. variable names).
  • stringsAsFactors: logical. Convert all character columns to factors?
  • envir: the environment where the data should be loaded.
  • ...: additional arguments passed to fread or load as relevant.

Returns

a data frame

Details

Loads plain-text data (e.g. CSV, TSV) or RDATA from a URL. Works with both HTTP and HTTPS sites. Note: the URL you give for the url argument must be for the RAW version of the file. The function should work to download plain-text data from any secure URL (https), though I have not verified this.

From the source_url documentation: "If a SHA-1 hash is specified with the sha1 argument, then this function will check the SHA-1 hash of the downloaded file to make sure it matches the expected value, and throw an error if it does not match. If the SHA-1 hash is not specified, it will print a message displaying the hash of the downloaded file. The purpose of this is to improve security when running remotely-hosted code; if you have a hash of the file, you can be sure that it has not changed."

Examples

## Not run: # Download electoral disproportionality data stored on GitHub # Note: Using shortened URL created by bitly DisData <- source_data("http://bit.ly/156oQ7a") # Check to see if SHA-1 hash matches downloaded file DisDataHash <- source_data("http://bit.ly/Ss6zDO", sha1 = "dc8110d6dff32f682bd2f2fdbacb89e37b94f95d") ## End(Not run)

See Also

httr , fread, and load

  • Maintainer: Christopher Gandrud
  • License: GPL (>= 3)
  • Last published: 2016-02-07