DataBackend function

DataBackend

DataBackend

This is the abstract base class for data backends.

Data backends provide a layer of abstraction for various data storage systems. It is not recommended to work directly with the DataBackend. Instead, all data access is handled transparently via the Task .

This package comes with two implementations for backends:

  • DataBackendDataTable which stores the data as data.table::data.table().
  • DataBackendMatrix which stores the data as sparse Matrix::sparseMatrix().

To connect to out-of-memory database management systems such as SQL servers, see the extension package list("mlr3db").

Details

The required set of fields and methods to implement a custom DataBackend is listed in the respective sections (see DataBackendDataTable or DataBackendMatrix for exemplary implementations of the interface).

Examples

data = data.table::data.table(id = 1:5, x = runif(5), y = sample(letters[1:3], 5, replace = TRUE)) b = DataBackendDataTable$new(data, primary_key = "id") print(b) b$head(2) b$data(rows = 1:2, cols = "x") b$distinct(rows = b$rownames, "y") b$missings(rows = b$rownames, cols = names(data))

See Also

Other DataBackend: DataBackendDataTable, DataBackendMatrix, as_data_backend.Matrix()

Public fields

  • primary_key: (character(1))

     Column name of the primary key column of positive and unique integer row ids.
    

Active bindings

  • data_formats: (character())

     Supported data format. Always `"data.table"`.. This is deprecated and will be removed in the future.
    
  • hash: (character(1))

     Hash (unique identifier) for this object.
    
  • col_hashes: (named character)

     Hash (unique identifier) for all columns except the `primary_key`: A `character` vector, named by the columns that each element refers to.
     
     Columns of different `Task`s or `DataBackend`s that have agreeing `col_hashes` always represent the same data, given that the same `row`s are selected. The reverse is not necessarily true: There can be columns with the same content that have different `col_hashes`.
    

Methods

Public methods

Method new()

Creates a new instance of this R6 class.

Note: This object is typically constructed via a derived classes, e.g. DataBackendDataTable or DataBackendMatrix , or via the S3 method as_data_backend().

Usage

DataBackend$new(data, primary_key, data_formats)

Arguments

  • data: (any)

     The format of the input data depends on the specialization. E.g., DataBackendDataTable expects a `data.table::data.table()` and DataBackendMatrix expects a `Matrix::Matrix()` from [list("Matrix")](https://CRAN.R-project.org/package=Matrix).
    
  • primary_key: (character(1))

     Each DataBackend needs a way to address rows, which is done via a column of unique integer values, referenced here by `primary_key`. The use of this variable may differ between backends.
    
  • data_formats: (character())

     Deprecated: ignored, and will be removed in the future.
    

Method format()

Helper for print outputs.

Usage

DataBackend$format(...)

Arguments

  • ...: (ignored).

Method print()

Printer.

Usage

DataBackend$print()