new_extension_type function

Extension types

Extension types

Extension arrays are wrappers around regular Arrow Array objects that provide some customized behaviour and/or storage. A common use-case for extension types is to define a customized conversion between an an Arrow Array and an R object when the default conversion is slow or loses metadata important to the interpretation of values in the array. For most types, the built-in vctrs extension type is probably sufficient.

new_extension_type( storage_type, extension_name, extension_metadata = raw(), type_class = ExtensionType ) new_extension_array(storage_array, extension_type) register_extension_type(extension_type) reregister_extension_type(extension_type) unregister_extension_type(extension_name)

Arguments

  • storage_type: The data type of the underlying storage array.
  • extension_name: The extension name. This should be namespaced using "dot" syntax (i.e., "some_package.some_type"). The namespace "arrow" is reserved for extension types defined by the Apache Arrow libraries.
  • extension_metadata: A raw() or character() vector containing the serialized version of the type. Character vectors must be length 1 and are converted to UTF-8 before converting to raw().
  • type_class: An R6::R6Class whose $new() class method will be used to construct a new instance of the type.
  • storage_array: An Array object of the underlying storage.
  • extension_type: An ExtensionType instance.

Returns

  • new_extension_type() returns an ExtensionType instance according to the type_class specified.

  • new_extension_array() returns an ExtensionArray whose $type

    corresponds to extension_type.

  • register_extension_type(), unregister_extension_type()

    and reregister_extension_type() return NULL, invisibly.

Details

These functions create, register, and unregister ExtensionType

and ExtensionArray objects. To use an extension type you will have to:

  • Define an R6::R6Class that inherits from ExtensionType and reimplement one or more methods (e.g., deserialize_instance()).
  • Make a type constructor function (e.g., my_extension_type()) that calls new_extension_type() to create an R6 instance that can be used as a data type elsewhere in the package.
  • Make an array constructor function (e.g., my_extension_array()) that calls new_extension_array() to create an Array instance of your extension type.
  • Register a dummy instance of your extension type created using you constructor function using register_extension_type().

If defining an extension type in an R package, you will probably want to use reregister_extension_type() in that package's .onLoad() hook since your package will probably get reloaded in the same R session during its development and register_extension_type() will error if called twice for the same extension_name. For an example of an extension type that uses most of these features, see vctrs_extension_type().

Examples

# Create the R6 type whose methods control how Array objects are # converted to R objects, how equality between types is computed, # and how types are printed. QuantizedType <- R6::R6Class( "QuantizedType", inherit = ExtensionType, public = list( # methods to access the custom metadata fields center = function() private$.center, scale = function() private$.scale, # called when an Array of this type is converted to an R vector as_vector = function(extension_array) { if (inherits(extension_array, "ExtensionArray")) { unquantized_arrow <- (extension_array$storage()$cast(float64()) / private$.scale) + private$.center as.vector(unquantized_arrow) } else { super$as_vector(extension_array) } }, # populate the custom metadata fields from the serialized metadata deserialize_instance = function() { vals <- as.numeric(strsplit(self$extension_metadata_utf8(), ";")[[1]]) private$.center <- vals[1] private$.scale <- vals[2] } ), private = list( .center = NULL, .scale = NULL ) ) # Create a helper type constructor that calls new_extension_type() quantized <- function(center = 0, scale = 1, storage_type = int32()) { new_extension_type( storage_type = storage_type, extension_name = "arrow.example.quantized", extension_metadata = paste(center, scale, sep = ";"), type_class = QuantizedType ) } # Create a helper array constructor that calls new_extension_array() quantized_array <- function(x, center = 0, scale = 1, storage_type = int32()) { type <- quantized(center, scale, storage_type) new_extension_array( Array$create((x - center) * scale, type = storage_type), type ) } # Register the extension type so that Arrow knows what to do when # it encounters this extension type reregister_extension_type(quantized()) # Create Array objects and use them! (vals <- runif(5, min = 19, max = 21)) (array <- quantized_array( vals, center = 20, scale = 2^15 - 1, storage_type = int16() ) ) array$type$center() array$type$scale() as.vector(array)
  • Maintainer: Jonathan Keane
  • License: Apache License (>= 2.0)
  • Last published: 2025-02-26