This is the abstract base class for data backends.
Data backends provide a layer of abstraction for various data storage systems. It is not recommended to work directly with the DataBackend. Instead, all data access is handled transparently via the Task .
This package comes with two implementations for backends:
DataBackendDataTable which stores the data as data.table::data.table().
DataBackendMatrix which stores the data as sparse Matrix::sparseMatrix().
To connect to out-of-memory database management systems such as SQL servers, see the extension package list("mlr3db").
Details
The required set of fields and methods to implement a custom DataBackend is listed in the respective sections (see DataBackendDataTable or DataBackendMatrix for exemplary implementations of the interface).
Examples
data = data.table::data.table(id =1:5, x = runif(5), y = sample(letters[1:3],5, replace =TRUE))b = DataBackendDataTable$new(data, primary_key ="id")print(b)b$head(2)b$data(rows =1:2, cols ="x")b$distinct(rows = b$rownames,"y")b$missings(rows = b$rownames, cols = names(data))
Other DataBackend: DataBackendDataTable, DataBackendMatrix, as_data_backend.Matrix()
Public fields
primary_key: (character(1))
Column name of the primary key column of positive and unique integer row ids.
Active bindings
data_formats: (character())
Supported data format. Always `"data.table"`.. This is deprecated and will be removed in the future.
hash: (character(1))
Hash (unique identifier) for this object.
col_hashes: (named character)
Hash (unique identifier) for all columns except the `primary_key`: A `character` vector, named by the columns that each element refers to.
Columns of different `Task`s or `DataBackend`s that have agreeing `col_hashes` always represent the same data, given that the same `row`s are selected. The reverse is not necessarily true: There can be columns with the same content that have different `col_hashes`.
Note: This object is typically constructed via a derived classes, e.g. DataBackendDataTable or DataBackendMatrix , or via the S3 method as_data_backend().
Usage
DataBackend$new(data, primary_key, data_formats)
Arguments
data: (any)
The format of the input data depends on the specialization. E.g., DataBackendDataTable expects a `data.table::data.table()` and DataBackendMatrix expects a `Matrix::Matrix()` from [list("Matrix")](https://CRAN.R-project.org/package=Matrix).
primary_key: (character(1))
Each DataBackend needs a way to address rows, which is done via a column of unique integer values, referenced here by `primary_key`. The use of this variable may differ between backends.
data_formats: (character())
Deprecated: ignored, and will be removed in the future.