R Interface to Apache Spark
Determine whether arrow is able to serialize the given R object
Set/Get Spark checkpoint directory
Collect
Collect Spark data serialized in RDS format into R
Compile Scala sources into a Java Archive (jar)
Read configuration values for a connection
Check whether the connection is open
A Shiny app that can be used to construct a spark_connect
statement
Copy To
Copy an R Data Frame to Spark
DBI Spark Result.
Distinct
Downloads default Scala Compilers
dplyr wrappers for Apache Spark higher order functions
Enforce Specific Structure for R Objects
Fill
Filter
Discover the Scala Compiler
Feature Transformation -- Binarizer (Transformer)
Feature Transformation -- Bucketizer (Transformer)
Feature Transformation -- ChiSqSelector (Estimator)
Feature Transformation -- CountVectorizer (Estimator)
Feature Transformation -- Discrete Cosine Transform (DCT) (Transformer...
Feature Transformation -- ElementwiseProduct (Transformer)
Feature Transformation -- FeatureHasher (Transformer)
Feature Transformation -- HashingTF (Transformer)
Feature Transformation -- IDF (Estimator)
Feature Transformation -- Imputer (Estimator)
Feature Transformation -- IndexToString (Transformer)
Feature Transformation -- Interaction (Transformer)
Feature Transformation -- LSH (Estimator)
Utility functions for LSH models
Feature Transformation -- MaxAbsScaler (Estimator)
Feature Transformation -- MinMaxScaler (Estimator)
Feature Transformation -- NGram (Transformer)
Feature Transformation -- Normalizer (Transformer)
Feature Transformation -- OneHotEncoder (Transformer)
Feature Transformation -- OneHotEncoderEstimator (Estimator)
Feature Transformation -- PCA (Estimator)
Feature Transformation -- PolynomialExpansion (Transformer)
Feature Transformation -- QuantileDiscretizer (Estimator)
Feature Transformation -- RFormula (Estimator)
Feature Transformation -- RegexTokenizer (Transformer)
Feature Transformation -- RobustScaler (Estimator)
Feature Transformation -- StandardScaler (Estimator)
Feature Transformation -- StopWordsRemover (Transformer)
Feature Transformation -- StringIndexer (Estimator)
Feature Transformation -- Tokenizer (Transformer)
Feature Transformation -- VectorAssembler (Transformer)
Feature Transformation -- VectorIndexer (Estimator)
Feature Transformation -- VectorSlicer (Transformer)
Feature Transformation -- Word2Vec (Estimator)
Full join
Generic Call Interface
Retrieve the Spark connection's SQL catalog implementation property
Infix operator for composing a lambda expression
Runtime configuration interface for Hive
Apply Aggregate Function to Array Column
Sorts array using a custom comparator
Determine Whether Some Element Exists in an Array Column
Filter Array Column
Checks whether all elements in an array satisfy a predicate
Filters a map
Merges two maps into one
Transform Array Column
Transforms keys of a map
Transforms values of a map
Combines 2 Array Columns
Inner join
Invoke a Method on a JVM Object
Generic Call Interface
Invoke a Java function.
Generic Call Interface
Instantiate a Java array with a specific element type.
Instantiate a Java float type.
Instantiate an Array[Float].
Superclasses of object
Parameter Setting for JVM Objects
Join Spark tbls.
Left join
list all sparklyr-*.jar files that have been built
Create a Spark Configuration for Livy
Install Livy
Start Livy
Constructors for Pipeline Stages
Constructors for ml_model
Objects
Spark ML -- ML Params
Spark ML -- Model Persistence
Spark ML -- Transform, fit, and predict methods (ml_ interface)
Spark ML -- Tuning
Add a Stage to a Pipeline
Spark ML -- Survival Regression
Spark ML -- ALS
Tidying methods for Spark ML ALS
Spark ML -- Bisecting K-Means Clustering
Wrap a Spark ML JVM object
Chi-square hypothesis testing for categorical data.
Spark ML - Clustering Evaluator
Compute correlation matrix
Spark ML -- Decision Trees
Default stop words
Evaluate the Model on a Validation Set
Spark ML - Evaluators
Spark ML - Feature Importance for Tree Models
Frequent Pattern Mining -- FPGrowth
Spark ML -- Gaussian Mixture clustering.
Spark ML -- Generalized Linear Regression
Tidying methods for Spark ML linear models
Spark ML -- Gradient Boosted Trees
Spark ML -- Isotonic Regression
Tidying methods for Spark ML Isotonic Regression
Spark ML -- K-Means Clustering
Evaluate a K-mean clustering
Spark ML -- Latent Dirichlet Allocation
Tidying methods for Spark ML LDA models
Spark ML -- Linear Regression
Spark ML -- LinearSVC
Tidying methods for Spark ML linear svc
Spark ML -- Logistic Regression
Tidying methods for Spark ML Logistic Regression
Extracts metrics from a fitted table
Extracts metrics from a fitted table
Extracts metrics from a fitted table
Extracts data associated with a Spark ML model
Spark ML -- Multilayer Perceptron
Tidying methods for Spark ML MLP
Spark ML -- Naive-Bayes
Tidying methods for Spark ML Naive Bayes
Spark ML -- OneVsRest
Tidying methods for Spark ML Principal Component Analysis
Spark ML -- Pipelines
Spark ML -- Power Iteration Clustering
Frequent Pattern Mining -- PrefixSpan
Spark ML -- Random Forest
Spark ML -- Pipeline stage extraction
Standardize Formula Input for ml_model
Spark ML -- Extraction of summary metrics
Tidying methods for Spark ML Survival Regression
Tidying methods for Spark ML tree models
Spark ML -- UID
Tidying methods for Spark ML unsupervised models
Mutate
Replace Missing Values in Objects
Nest
Pipe operator
Pivot longer
Pivot wider
Generic method for print jobj for a connection type
Translate input character vector or symbol to a SQL identifier
Random string generation
Reactive spark reader
Objects exported from other packages
Register a Package that Implements a Spark Extension
Register a Parallel Backend
Replace NA
Right join
Save / Load a Spark DataFrame
Spark ML -- Transform, fit, and predict methods (sdf_ interface)
Create DataFrame for along Object
Bind multiple Spark DataFrames by row and column
Broadcast hint
Checkpoint a Spark DataFrame
Coalesces a Spark DataFrame
Collect a Spark DataFrame into R.
Copy an Object into Spark
Cross Tabulation
Debug Info for Spark DataFrame
Compute summary statistics for columns of a data frame
Support for Dimension Operations
Invoke distinct on a Spark DataFrame
Remove duplicates from a Spark DataFrame
Create a Spark dataframe containing all combinations of inputs
Fast cbind for Spark DataFrames
Convert column(s) from avro format
Spark DataFrame is Streaming
Returns the last index of a Spark DataFrame
Create DataFrame for Length
Gets number of partitions of a Spark DataFrame
Compute the number of records within each partition of a Spark DataFra...
Persist a Spark DataFrame
Pivot a Spark DataFrame
Project features onto principal components
Compute (Approximate) Quantiles with a Spark DataFrame
Partition a Spark Dataframe
Generate random samples from a Beta distribution
Generate random samples from a binomial distribution
Generate random samples from a Cauchy distribution
Generate random samples from a chi-squared distribution
Read a Column from a Spark DataFrame
Register a Spark DataFrame
Repartition a Spark DataFrame
Model Residuals
Generate random samples from an exponential distribution
Generate random samples from a Gamma distribution
Generate random samples from a geometric distribution
Generate random samples from a hypergeometric distribution
Generate random samples from a log normal distribution
Generate random samples from the standard normal distribution
Generate random samples from a Poisson distribution
Generate random samples from a t-distribution
Generate random samples from the uniform distribution U(0, 1).
Generate random samples from a Weibull distribution.
Randomly Sample Rows from a Spark DataFrame
Read the Schema of a Spark DataFrame
Separate a Vector Column into Scalar Columns
Create DataFrame for Range
Sort a Spark DataFrame
Spark DataFrame from SQL
Convert column(s) to avro format
Unnest longer
Unnest wider
Perform Weighted Random Sampling on a Spark DataFrame
Add a Sequential ID Column to a Spark DataFrame
Add a Unique ID Column to a Spark DataFrame
Select
Separate
Access the Spark API
Manage Spark Connections
Retrieves or sets status of Spark AQE
Retrieves or sets advisory size of the shuffle partition
Apply an R Function in Spark
Create Bundle for Spark Apply
Log Writer for Spark Apply
Retrieves or sets the auto broadcast join threshold
Retrieves or sets initial number of shuffle partitions before coalesci...
Retrieves or sets the minimum number of shuffle partitions after coale...
Retrieves or sets whether coalescing contiguous shuffle partitions is ...
Define a Spark Compilation Specification
Compile Scala sources into a Java Archive
Read Spark Configuration
A helper function to check value exist under spark_config()
Kubernetes Configuration
Creates Spark Configuration
Retrieve Available Settings
A helper function to retrieve values from spark_config()
Runtime configuration interface for the Spark Session
Function that negotiates the connection with the Spark back-end
spark_connection class
Retrieve the Spark Connection Associated with an R Object
Find Spark Connection
Runtime configuration interface for the Spark Context.
Retrieve a Spark DataFrame
Default Compilation Specification for Spark Extensions
determine the version that will be used by default if version is NULL
Define a Spark dependency
Fallback to Spark Dependency
Create Spark Extension
Find path to Java
Find the SPARK_HOME directory for a version of Spark
Set the SPARK_HOME environment variable
Set of functions to provide integration with the RStudio IDE
Inserts a Spark DataFrame into a Spark table
Download and install various versions of Spark
Find a given Spark installation by version.
helper function to sync sparkinstall project to sparklyr
It lets the package know if it should test a particular functionality ...
spark_jobj class
Retrieve a Spark JVM Object Reference
Surfaces the last error from Spark captured by internal spark_error
...
Reads from a Spark Table into a Spark DataFrame.
View Entries in the Spark Log
Create a Pipeline Stage Object
Read file(s) into a Spark DataFrame using a custom reader
Read Apache Avro data into a Spark DataFrame.
Read binary data into a Spark DataFrame.
Read a CSV file into a Spark DataFrame
Read from Delta Lake into a Spark DataFrame.
Read image data into a Spark DataFrame.
Read from JDBC connection into a Spark DataFrame.
Read a JSON file into a Spark DataFrame
Read libsvm file into a Spark DataFrame.
Read a ORC file into a Spark DataFrame
Read a Parquet file into a Spark DataFrame
Read from a generic source into a Spark DataFrame.
Reads from a Spark Table into a Spark DataFrame.
Read a Text file into a Spark DataFrame
Saves a Spark DataFrame as a Spark table
Generate random samples from some distribution
Generate a Table Name from Expression
Get the Spark Version Associated with a Spark Connection
Get the Spark Version Associated with a Spark Installation
Retrieves a dataframe available Spark versions that van be installed.
Open the Spark web interface
Write Spark DataFrame to file using a custom writer
Serialize a Spark DataFrame into Apache Avro format
Write a Spark DataFrame to a CSV
Writes a Spark DataFrame into Delta Lake
Writes a Spark DataFrame into a JDBC table
Write a Spark DataFrame to a JSON file
Write a Spark DataFrame to a ORC file
Write a Spark DataFrame to a Parquet file
Write Spark DataFrame to RDS files
Writes a Spark DataFrame into a generic source
Writes a Spark DataFrame into a Spark table
Write a Spark DataFrame to a Text file
Return the port number of a sparklyr
backend.
Feature Transformation -- SQLTransformer
Show database list
Find Stream
Generate Test Stream
Spark Stream's Identifier
Apply lag function to columns of a Spark Streaming DataFrame
Spark Stream's Name
Read files created by the stream
Render Stream
Stream Statistics
Stops a Spark Stream
Spark Stream Continuous Trigger
Spark Stream Interval Trigger
View Stream
Watermark Stream
Write files to the stream
Write Memory Stream
Write Stream to Table
Subsetting operator for Spark dataframe
Cache a Spark Table
Use specific database
Uncache a Spark Table
transform a subset of column(s) in a Spark Dataframe
Unite
Unnest
Extracts a bundle of dependencies required by spark_apply()
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.