src_impala function

Connect to Impala and create a remote dplyr data source

Connect to Impala and create a remote dplyr data source

src_impala creates a SQL backend to dplyr for Apache Impala, the massively parallel processing query engine for Apache Hadoop.

src_impala can work with any DBI-compatible interface that provides connectivity to Impala. Currently, two packages that can provide this connectivity are odbc and RJDBC.

src_impala(drv, ..., auto_disconnect = TRUE)

Arguments

  • drv: an object that inherits from DBIDriver-class. For example, an object returned by odbc or JDBC

  • ...: arguments passed to the underlying Impala database connection method dbConnect. See dbConnect,OdbcDriver-method or dbConnect,JDBCDriver-method

  • auto_disconnect: Should the connection to Impala be automatically closed when the object returned by this function is deleted? Pass NA

    to auto-disconnect but print a message when this happens.

Returns

An object with class src_impala, src_sql, src

Examples

# Using ODBC connectivity: ## Not run: library(odbc) drv <- odbc::odbc() impala <- src_impala( drv = drv, driver = "Cloudera ODBC Driver for Impala", host = "host", port = 21050, database = "default", uid = "username", pwd = "password" ) ## End(Not run) # Using JDBC connectivity: ## Not run: library(RJDBC) Sys.setenv(JAVA_HOME = "/path/to/java/home/") impala_classpath <- list.files( path = "/path/to/jdbc/driver", pattern = "\\.jar$", full.names = TRUE ) .jinit(classpath = impala_classpath) drv <- JDBC( driverClass = "com.cloudera.impala.jdbc41.Driver", classPath = impala_classpath, identifier.quote = "`" ) impala <- src_impala( drv, "jdbc:impala://host:21050", "username", "password" ) ## End(Not run)

See Also

c("Impala\n", "ODBC driver"), c("Impala\n", "JDBC driver")

  • Maintainer: Ian Cook
  • License: Apache License 2.0 | file LICENSE
  • Last published: 2024-02-06