Memory-Efficient Storage of Large Data on Disk and Fast Access Functions
Incrementing an ff or ram object
Array: make vector from array
Array: make vector positions from array index
Conversion between bit and ff boolean
Coercing ram to ff and ff to ram objects
Coercing to ffdf and data.frame
Hybrid Index, coercion to
Hybrid Index, coercing from
Coercing to virtual mode
Sampling from large pools
Collapsing functions for batch processing
Chunk ff_vector and ffdf
Cloning ff and ram objects
Cloning ffdf objects
Closing ff files
Deleting the file behind an ff object
Getting and setting dim and dimorder
Getting and setting dimnames
Getting and setting dimnames of ffdf
Test for dimorder compatibility
Array: make dimnames
Reading and writing vectors and arrays (high-level)
Reading and writing data.frames (ffdf)
ff classes for representing (large) atomic data
Apply for ff objects
Get most conforming argument
ff class for data.frames
Reading and writing ffdf data.frame using ff subscripts
Sorting: convenience wrappers for data.frames
Delete an ffarchive
Reading and writing ff vectors using ff subscripts
Sorting: chunked ordering of integer suscript positions
Inspect content of ff saves
Reload ffSaved Datasets
Sorting: order from ff vectors
Return suitable ff object
Save R and ff objects
Sorting of ff vectors
Test ff object for suitability
Test for availability of ff extensions
Change size of move an existing file
Get or set filename
Call finalizer
Get and set finalizer (name)
Test for fixed diagonal
Forbidden ffdf functions
Get error and error string
Get page size information
Reading and writing vectors of values (low-level)
Hybrid index class
Hybrid Index, parsing
Internal ffdf functions
Test for class ff
Test for class ff
Test if object is opened
Get readonly status
Getting and setting 'is.sorted' physical attribute
Getting and setting length
Getting length of a ffdf dataframe
Hybrid Index, querying
Getting and setting factor levels
ff Limitations and Warnings
Array: make matrix indices from row and columns positions
Print beginning and end of big matrix
Lossless vmode coercability
Get physical length of an ff or ram object
Test for recycle mismatch
Getting and setting 'na.count' physical attribute
Getting and setting names
Assigning the number of rows or columns
Opening an ff file
Pagesize of ff object
Getting and setting physical and virtual attributes of ff objects
Getting physical and virtual attributes of ffdf objects
Print and str methods
Factor codings
Get ramclass and ramattribs
Sorting: order R vector in-RAM and in-place
Sorting: Sort R vector in-RAM and in-place
Importing csv files into ff data.frames
Reading and writing vectors (low-level)
Sorting: regression tests
Replicate with names
Factor level manipulation
Analyze pathfile-strings
Reading and writing in one operation (high-level)
Test for symmetric structure
Array: make vector positions from symmetric array index
Unclassed assignement
Undim
Hybrid Index, internal utilities
Update ff content from another object
Print beginning and end of big vector
Create vector of virtual mode
Array: make array from vector
Array: make array from index vector positions
Virtual storage mode of ffdf
Virtual storage mode
Virtual transpose
Getting and setting virtual windows
Exporting csv files from ff data.frames
The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory - the effective virtual memory consumption per ff object. ff supports R's standard atomic data types 'double', 'logical', 'raw' and 'integer' and non-standard atomic types boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example 'quad' allows efficient storage of genomic data as an 'A','T','G','C' factor. The unsigned types support 'circular' arithmetic. There is also support for close-to-atomic types 'factor', 'ordered', 'POSIXct', 'Date' and custom close-to-atomic types. ff not only has native C-support for vectors, matrices and arrays with flexible dimorder (major column-order, major row-order and generalizations for arrays). There is also a ffdf class not unlike data.frames and import/export filters for csv files. ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have well-defined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/de-coding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with 'permanent' files as well as creating/removing 'temporary' ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, 'logicals' and non-standard data types get stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). ff interfaces closely with functionality from package 'bit': chunked looping, fast bit operations and coercions between different objects that can store subscript information ('bit', 'bitwhich', ff 'boolean', ri range index, hi hybrid index). This allows to work interactively with selections of large datasets and quickly modify selection criteria. Further high-performance enhancements can be made available upon request.