sparklyr R package [Documentation]

arrow_enabled_object

Determine whether arrow is able to serialize the given R object

checkpoint_directory

Set/Get Spark checkpoint directory

collect_from_rds

Collect Spark data serialized in RDS format into R

collect

Collect

compile_package_jars

Compile Scala sources into a Java Archive (jar)

connection_config

Read configuration values for a connection

connection_is_open

Check whether the connection is open

connection_spark_shinyapp

A Shiny app that can be used to construct a spark_connect statement

copy_to

Copy To

copy_to.spark_connection

Copy an R Data Frame to Spark

DBISparkResult-class

DBI Spark Result.

distinct

Distinct

download_scalac

Downloads default Scala Compilers

dplyr_hof

dplyr wrappers for Apache Spark higher order functions

ensure

Enforce Specific Structure for R Objects

fill

Fill

filter

Filter

find_scalac

Discover the Scala Compiler

ft_binarizer

Feature Transformation -- Binarizer (Transformer)

ft_bucketizer

Feature Transformation -- Bucketizer (Transformer)

ft_chisq_selector

Feature Transformation -- ChiSqSelector (Estimator)

ft_count_vectorizer

Feature Transformation -- CountVectorizer (Estimator)

ft_dct

Feature Transformation -- Discrete Cosine Transform (DCT) (Transformer...

ft_elementwise_product

Feature Transformation -- ElementwiseProduct (Transformer)

ft_feature_hasher

Feature Transformation -- FeatureHasher (Transformer)

ft_hashing_tf

Feature Transformation -- HashingTF (Transformer)

ft_idf

Feature Transformation -- IDF (Estimator)

ft_imputer

Feature Transformation -- Imputer (Estimator)

ft_index_to_string

Feature Transformation -- IndexToString (Transformer)

ft_interaction

Feature Transformation -- Interaction (Transformer)

ft_lsh_utils

Utility functions for LSH models

ft_lsh

Feature Transformation -- LSH (Estimator)

ft_max_abs_scaler

Feature Transformation -- MaxAbsScaler (Estimator)

ft_min_max_scaler

Feature Transformation -- MinMaxScaler (Estimator)

ft_ngram

Feature Transformation -- NGram (Transformer)

ft_normalizer

Feature Transformation -- Normalizer (Transformer)

ft_one_hot_encoder_estimator

Feature Transformation -- OneHotEncoderEstimator (Estimator)

ft_one_hot_encoder

Feature Transformation -- OneHotEncoder (Transformer)

ft_pca

Feature Transformation -- PCA (Estimator)

ft_polynomial_expansion

Feature Transformation -- PolynomialExpansion (Transformer)

ft_quantile_discretizer

Feature Transformation -- QuantileDiscretizer (Estimator)

ft_r_formula

Feature Transformation -- RFormula (Estimator)

ft_regex_tokenizer

Feature Transformation -- RegexTokenizer (Transformer)

ft_robust_scaler

Feature Transformation -- RobustScaler (Estimator)

ft_standard_scaler

Feature Transformation -- StandardScaler (Estimator)

ft_stop_words_remover

Feature Transformation -- StopWordsRemover (Transformer)

ft_string_indexer

Feature Transformation -- StringIndexer (Estimator)

ft_tokenizer

Feature Transformation -- Tokenizer (Transformer)

ft_vector_assembler

Feature Transformation -- VectorAssembler (Transformer)

ft_vector_indexer

Feature Transformation -- VectorIndexer (Estimator)

ft_vector_slicer

Feature Transformation -- VectorSlicer (Transformer)

ft_word2vec

Feature Transformation -- Word2Vec (Estimator)

full_join

Full join

generic_call_interface

Generic Call Interface

get_spark_sql_catalog_implementation

Retrieve the Spark connection's SQL catalog implementation property

grapes-greater-than-grapes

Infix operator for composing a lambda expression

hive_context_config

Runtime configuration interface for Hive

hof_aggregate

Apply Aggregate Function to Array Column

hof_array_sort

Sorts array using a custom comparator

hof_exists

Determine Whether Some Element Exists in an Array Column

hof_filter

Filter Array Column

hof_forall

Checks whether all elements in an array satisfy a predicate

hof_map_filter

Filters a map

hof_map_zip_with

Merges two maps into one

hof_transform_keys

Transforms keys of a map

hof_transform_values

Transforms values of a map

hof_transform

Transform Array Column

hof_zip_with

Combines 2 Array Columns

inner_join

Inner join

invoke_method

Generic Call Interface

invoke

Invoke a Method on a JVM Object

j_invoke_method

Generic Call Interface

j_invoke

Invoke a Java function.

jarray

Instantiate a Java array with a specific element type.

jfloat_array

Instantiate an Array[Float].

jfloat

Instantiate a Java float type.

jobj_class

Superclasses of object

jobj_set_param

Parameter Setting for JVM Objects

join.tbl_spark

Join Spark tbls.

left_join

Left join

list_sparklyr_jars

list all sparklyr-*.jar files that have been built

livy_config

Create a Spark Configuration for Livy

livy_install

Install Livy

livy_service

Start Livy

ml_add_stage

Add a Stage to a Pipeline

ml_aft_survival_regression

Spark ML -- Survival Regression

ml_als_tidiers

Tidying methods for Spark ML ALS

ml_als

Spark ML -- ALS

ml_bisecting_kmeans

Spark ML -- Bisecting K-Means Clustering

ml_call_constructor

Wrap a Spark ML JVM object

ml_chisquare_test

Chi-square hypothesis testing for categorical data.

ml_clustering_evaluator

Spark ML - Clustering Evaluator

ml_corr

Compute correlation matrix

ml_decision_tree

Spark ML -- Decision Trees

ml_default_stop_words

Default stop words

ml_evaluate

Evaluate the Model on a Validation Set

ml_evaluator

Spark ML - Evaluators

ml_feature_importances

Spark ML - Feature Importance for Tree Models

ml_fpgrowth

Frequent Pattern Mining -- FPGrowth

ml_gaussian_mixture

Spark ML -- Gaussian Mixture clustering.

ml_generalized_linear_regression

Spark ML -- Generalized Linear Regression

ml_glm_tidiers

Tidying methods for Spark ML linear models

ml_gradient_boosted_trees

Spark ML -- Gradient Boosted Trees

ml_isotonic_regression_tidiers

Tidying methods for Spark ML Isotonic Regression

ml_isotonic_regression

Spark ML -- Isotonic Regression

ml_kmeans_cluster_eval

Evaluate a K-mean clustering

ml_kmeans

Spark ML -- K-Means Clustering

ml_lda_tidiers

Tidying methods for Spark ML LDA models

ml_lda

Spark ML -- Latent Dirichlet Allocation

ml_linear_regression

Spark ML -- Linear Regression

ml_linear_svc_tidiers

Tidying methods for Spark ML linear svc

ml_linear_svc

Spark ML -- LinearSVC

ml_logistic_regression_tidiers

Tidying methods for Spark ML Logistic Regression

ml_logistic_regression

Spark ML -- Logistic Regression

ml_metrics_binary

Extracts metrics from a fitted table

ml_metrics_multiclass

Extracts metrics from a fitted table

ml_metrics_regression

Extracts metrics from a fitted table

ml_model_data

Extracts data associated with a Spark ML model

ml_multilayer_perceptron_classifier

Spark ML -- Multilayer Perceptron

ml_multilayer_perceptron_tidiers

Tidying methods for Spark ML MLP

ml_naive_bayes_tidiers

Tidying methods for Spark ML Naive Bayes

ml_naive_bayes

Spark ML -- Naive-Bayes

ml_one_vs_rest

Spark ML -- OneVsRest

ml_pca_tidiers

Tidying methods for Spark ML Principal Component Analysis

ml_pipeline

Spark ML -- Pipelines

ml_power_iteration

Spark ML -- Power Iteration Clustering

ml_prefixspan

Frequent Pattern Mining -- PrefixSpan

ml_random_forest

Spark ML -- Random Forest

ml_stage

Spark ML -- Pipeline stage extraction

ml_standardize_formula

Standardize Formula Input for ml_model

ml_summary

Spark ML -- Extraction of summary metrics

ml_survival_regression_tidiers

Tidying methods for Spark ML Survival Regression

ml_tree_tidiers

Tidying methods for Spark ML tree models

ml_uid

Spark ML -- UID

ml_unsupervised_tidiers

Tidying methods for Spark ML unsupervised models

ml-constructors

Constructors for Pipeline Stages

ml-model-constructors

Constructors for ml_model Objects

ml-params

Spark ML -- ML Params

ml-persistence

Spark ML -- Model Persistence

ml-transform-methods

Spark ML -- Transform, fit, and predict methods (ml_ interface)

ml-tuning

Spark ML -- Tuning

mutate

Mutate

na.replace

Replace Missing Values in Objects

nest

Nest

pipe

Pipe operator

pivot_longer

Pivot longer

pivot_wider

Pivot wider

print_jobj

Generic method for print jobj for a connection type

quote_sql_name

Translate input character vector or symbol to a SQL identifier

random_string

Random string generation

reactiveSpark

Reactive spark reader

reexports

Objects exported from other packages

register_extension

registerDoSpark

replace_na

Replace NA

right_join

Right join

sdf_along

Create DataFrame for along Object

sdf_bind

Bind multiple Spark DataFrames by row and column

sdf_broadcast

Broadcast hint

sdf_checkpoint

Checkpoint a Spark DataFrame

sdf_coalesce

Coalesces a Spark DataFrame

sdf_collect

Collect a Spark DataFrame into R.

sdf_copy_to

Copy an Object into Spark

sdf_crosstab

Cross Tabulation

sdf_debug_string

Debug Info for Spark DataFrame

sdf_describe

Compute summary statistics for columns of a data frame

sdf_dim

Support for Dimension Operations

sdf_distinct

Invoke distinct on a Spark DataFrame

sdf_drop_duplicates

Remove duplicates from a Spark DataFrame

sdf_expand_grid

Create a Spark dataframe containing all combinations of inputs

sdf_fast_bind_cols

Fast cbind for Spark DataFrames

sdf_from_avro

Convert column(s) from avro format

sdf_is_streaming

Spark DataFrame is Streaming

sdf_last_index

Returns the last index of a Spark DataFrame

sdf_len

Create DataFrame for Length

sdf_num_partitions

Gets number of partitions of a Spark DataFrame

sdf_partition_sizes

Compute the number of records within each partition of a Spark DataFra...

sdf_persist

Persist a Spark DataFrame

sdf_pivot

Pivot a Spark DataFrame

sdf_project

Project features onto principal components

sdf_quantile

Compute (Approximate) Quantiles with a Spark DataFrame

sdf_random_split

Partition a Spark Dataframe

sdf_rbeta

Generate random samples from a Beta distribution

sdf_rbinom

Generate random samples from a binomial distribution

sdf_rcauchy

Generate random samples from a Cauchy distribution

sdf_rchisq

Generate random samples from a chi-squared distribution

sdf_read_column

Read a Column from a Spark DataFrame

sdf_register

sdf_repartition

Repartition a Spark DataFrame

sdf_residuals

Model Residuals

sdf_rexp

Generate random samples from an exponential distribution

sdf_rgamma

Generate random samples from a Gamma distribution

sdf_rgeom

Generate random samples from a geometric distribution

sdf_rhyper

Generate random samples from a hypergeometric distribution

sdf_rlnorm

Generate random samples from a log normal distribution

sdf_rnorm

Generate random samples from the standard normal distribution

sdf_rpois

Generate random samples from a Poisson distribution

sdf_rt

Generate random samples from a t-distribution

sdf_runif

Generate random samples from the uniform distribution U(0, 1).

sdf_rweibull

Generate random samples from a Weibull distribution.

sdf_sample

Randomly Sample Rows from a Spark DataFrame

sdf_schema

Read the Schema of a Spark DataFrame

sdf_separate_column

Separate a Vector Column into Scalar Columns

sdf_seq

Create DataFrame for Range

sdf_sort

Sort a Spark DataFrame

sdf_sql

Spark DataFrame from SQL

sdf_to_avro

Convert column(s) to avro format

sdf_unnest_longer

Unnest longer

sdf_unnest_wider

Unnest wider

sdf_weighted_sample

Perform Weighted Random Sampling on a Spark DataFrame

sdf_with_sequential_id

Add a Sequential ID Column to a Spark DataFrame

sdf_with_unique_id

Add a Unique ID Column to a Spark DataFrame

sdf-saveload

Save / Load a Spark DataFrame

sdf-transform-methods

Spark ML -- Transform, fit, and predict methods (sdf_ interface)

select

Select

separate

Separate

spark_adaptive_query_execution

Retrieves or sets status of Spark AQE

spark_advisory_shuffle_partition_size

Retrieves or sets advisory size of the shuffle partition

spark_apply_bundle

Create Bundle for Spark Apply

spark_apply_log

Log Writer for Spark Apply

spark_apply

Apply an R Function in Spark

spark_auto_broadcast_join_threshold

Retrieves or sets the auto broadcast join threshold

spark_coalesce_initial_num_partitions

Retrieves or sets initial number of shuffle partitions before coalesci...

spark_coalesce_min_num_partitions

Retrieves or sets the minimum number of shuffle partitions after coale...

spark_coalesce_shuffle_partitions

Retrieves or sets whether coalescing contiguous shuffle partitions is ...

spark_compilation_spec

Define a Spark Compilation Specification

spark_compile

Compile Scala sources into a Java Archive

spark_config_exists

A helper function to check value exist under spark_config()

spark_config_kubernetes

Kubernetes Configuration

spark_config_packages

Creates Spark Configuration

spark_config_settings

Retrieve Available Settings

spark_config_value

A helper function to retrieve values from spark_config()

spark_config

Read Spark Configuration

spark_configuration

Runtime configuration interface for the Spark Session

spark_connect_method

Function that negotiates the connection with the Spark back-end

spark_connection_find

Find Spark Connection

spark_connection-class

spark_connection class

spark_connection

Retrieve the Spark Connection Associated with an R Object

spark_context_config

Runtime configuration interface for the Spark Context.

spark_dataframe

Retrieve a Spark DataFrame

spark_default_compilation_spec

Default Compilation Specification for Spark Extensions

spark_default_version

determine the version that will be used by default if version is NULL

spark_dependency_fallback

Fallback to Spark Dependency

spark_dependency

Define a Spark dependency

spark_extension

Create Spark Extension

spark_get_java

Find path to Java

spark_home_dir

Find the SPARK_HOME directory for a version of Spark

spark_home_set

Set the SPARK_HOME environment variable

spark_ide_connection_open

Set of functions to provide integration with the RStudio IDE

spark_insert_table

Inserts a Spark DataFrame into a Spark table

spark_install_find

Find a given Spark installation by version.

spark_install_sync

helper function to sync sparkinstall project to sparklyr

spark_install

Download and install various versions of Spark

spark_integ_test_skip

It lets the package know if it should test a particular functionality ...

spark_jobj-class

spark_jobj class

spark_jobj

Retrieve a Spark JVM Object Reference

spark_last_error

Surfaces the last error from Spark captured by internal spark_error ...

spark_load_table

Reads from a Spark Table into a Spark DataFrame.

spark_log

View Entries in the Spark Log

spark_pipeline_stage

Create a Pipeline Stage Object

spark_read_avro

Read Apache Avro data into a Spark DataFrame.

spark_read_binary

Read binary data into a Spark DataFrame.

spark_read_csv

Read a CSV file into a Spark DataFrame

spark_read_delta

Read from Delta Lake into a Spark DataFrame.

spark_read_image

Read image data into a Spark DataFrame.

spark_read_jdbc

Read from JDBC connection into a Spark DataFrame.

spark_read_json

Read a JSON file into a Spark DataFrame

spark_read_libsvm

Read libsvm file into a Spark DataFrame.

spark_read_orc

Read a ORC file into a Spark DataFrame

spark_read_parquet

Read a Parquet file into a Spark DataFrame

spark_read_source

Read from a generic source into a Spark DataFrame.

spark_read_table

Reads from a Spark Table into a Spark DataFrame.

spark_read_text

Read a Text file into a Spark DataFrame

spark_read

Read file(s) into a Spark DataFrame using a custom reader

spark_save_table

Saves a Spark DataFrame as a Spark table

spark_statistical_routines

Generate random samples from some distribution

spark_table_name

Generate a Table Name from Expression

spark_version_from_home

Get the Spark Version Associated with a Spark Installation

spark_version

Get the Spark Version Associated with a Spark Connection

spark_versions

Returns a data frame of available Spark versions that can be installed...

spark_web

Open the Spark web interface

spark_write_avro

Serialize a Spark DataFrame into Apache Avro format

spark_write_csv

Write a Spark DataFrame to a CSV

spark_write_delta

Writes a Spark DataFrame into Delta Lake

spark_write_jdbc

Writes a Spark DataFrame into a JDBC table

spark_write_json

Write a Spark DataFrame to a JSON file

spark_write_orc

Write a Spark DataFrame to a ORC file

spark_write_parquet

Write a Spark DataFrame to a Parquet file

spark_write_rds

Write Spark DataFrame to RDS files

spark_write_source

Writes a Spark DataFrame into a generic source

spark_write_table

Writes a Spark DataFrame into a Spark table

spark_write_text

Write a Spark DataFrame to a Text file

spark_write

Write Spark DataFrame to file using a custom writer

spark-api

Access the Spark API

spark-connections

Manage Spark Connections

sparklyr_get_backend_port

Return the port number of a sparklyr backend.

sql-transformer

Feature Transformation -- SQLTransformer

src_databases

Show database list

stream_find

Find Stream

stream_generate_test

Generate Test Stream

stream_id

Spark Stream's Identifier

stream_lag

Apply lag function to columns of a Spark Streaming DataFrame

stream_name

Spark Stream's Name

stream_read_csv

Read files created by the stream

stream_render

Render Stream

stream_stats

Stream Statistics

stream_stop

Stops a Spark Stream

stream_trigger_continuous

Spark Stream Continuous Trigger

stream_trigger_interval

Spark Stream Interval Trigger

stream_view

View Stream

stream_watermark

Watermark Stream

stream_write_csv

Write files to the stream

stream_write_memory

Write Memory Stream

stream_write_table

Write Stream to Table

sub-.tbl_spark

Subsetting operator for Spark dataframe

tbl_cache

Cache a Spark Table

tbl_change_db

Use specific database

tbl_uncache

Uncache a Spark Table

transform_sdf

transform a subset of column(s) in a Spark Dataframe

unite

Unite

unnest

Unnest

worker_spark_apply_unbundle

Extracts a bundle of dependencies required by spark_apply()

Download source package Read PDF manual

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintainer: Edgar Ruiz
License: Apache License 2.0 | file LICENSE
Last published: 2025-03-18

Useful links

sparklyr1.9.0 package

Functions

Readme

Dependencies

Imports

Versions

News

arrow_enabled_object

checkpoint_directory

collect_from_rds

collect

compile_package_jars

connection_config

connection_is_open

connection_spark_shinyapp

copy_to

copy_to.spark_connection

DBISparkResult-class

distinct

download_scalac

dplyr_hof

ensure

fill

filter

find_scalac

ft_binarizer

ft_bucketizer

ft_chisq_selector

ft_count_vectorizer

ft_dct

ft_elementwise_product

ft_feature_hasher

ft_hashing_tf

ft_idf

ft_imputer

ft_index_to_string

ft_interaction

ft_lsh_utils

ft_lsh

ft_max_abs_scaler

ft_min_max_scaler

ft_ngram

ft_normalizer

ft_one_hot_encoder_estimator

ft_one_hot_encoder

ft_pca

ft_polynomial_expansion

ft_quantile_discretizer

ft_r_formula

ft_regex_tokenizer

ft_robust_scaler

ft_standard_scaler

ft_stop_words_remover

ft_string_indexer

ft_tokenizer

ft_vector_assembler

ft_vector_indexer

ft_vector_slicer

ft_word2vec

full_join

generic_call_interface

get_spark_sql_catalog_implementation

grapes-greater-than-grapes

hive_context_config

hof_aggregate

hof_array_sort

hof_exists

hof_filter

hof_forall

hof_map_filter

hof_map_zip_with

hof_transform_keys

hof_transform_values

hof_transform

hof_zip_with

inner_join

invoke_method

invoke

j_invoke_method

j_invoke