Format-specific scan options
A FragmentScanOptions
holds options specific to a FileFormat
and a scan operation.
FragmentScanOptions$create()
takes the following arguments:
format
: A string identifier of the file format. Currently supported values:
...
: Additional format-specific options
format = "parquet"
:
use_buffered_stream
: Read files through buffered input streams rather than loading entire row groups at once. This may be enabled to reduce memory overhead. Disabled by default.buffer_size
: Size of buffered stream, if enabled. Default is 8KB.pre_buffer
: Pre-buffer the raw Parquet data. This can improve performance on high-latency filesystems. Disabled by default.thrift_string_size_limit
: Maximum string size allocated for decoding thrift strings. May need to be increased in order to read files with especially large headers. Default value 100000000.thrift_container_size_limit
: Maximum size of thrift containers. May need to be increased in order to read files with especially large headers. Default value 1000000. format = "text"
: see CsvConvertOptions . Note that options can only be specified with the Arrow C++ library naming. Also, "block_size" from CsvReadOptions may be given.It returns the appropriate subclass of FragmentScanOptions
(e.g. CsvFragmentScanOptions
).
Useful links