disk.frame0.8.3 package

Larger-than-RAM Disk-Based Data Manipulation Framework

add_chunk

Add a chunk to the disk.frame

as.data.frame.disk.frame

Convert disk.frame to data.frame by collecting all chunks

as.data.table.disk.frame

Convert disk.frame to data.table by collecting all chunks

as.disk.frame

Make a data.frame into a disk.frame

bind_rows.disk.frame

Bind rows

chunk_group_by

#' @export #' @importFrom dplyr add_count #' @rdname dplyr_verbs add_c...

cmap

Apply the same function to all chunks

cmap2

cmap2 a function to two disk.frames

collect

Bring the disk.frame into R

colnames

Return the column names of the disk.frame

compute.disk.frame

Force computations. The results are stored in a folder.

create_chunk_mapper

Create function that applies to each chunk if disk.frame

csv_to_disk.frame

Convert CSV file(s) to disk.frame format

delete

Delete a disk.frame

df_ram_size

Get the size of RAM in gigabytes

disk.frame

Create a disk.frame from a folder

disk.frame_to_parquet

A function to convert a disk.frame to parquet format

dplyr_verbs

The dplyr verbs implemented for disk.frame

evalparseglue

Helper function to evalparse some glue::glue string

find_globals_recursively

Find globals in an expression by searching through the chain

foverlaps.disk.frame

Apply data.table's foverlaps to the disk.frame

gen_datatable_synthetic

Generate synthetic dataset for testing

get_chunk

Obtain one chunk by chunk id

get_chunk_ids

Get the chunk IDs and files names

get_partition_paths

Get the partitioning structure of a folder

group_by

A function to parse the summarize function

groups.disk.frame

The shard keys of the disk.frame

head_tail

Head and tail of the disk.frame

is_disk.frame

Checks if a folder is a disk.frame

join

Performs join/merge for disk.frames

merge.disk.frame

Merge function for disk.frames

move_to

Move or copy a disk.frame to another location

nchunks

Returns the number of chunks in a disk.frame

ncol_nrow

Number of rows or columns

one-stage-group-by-verbs

One Stage function

overwrite_check

Check if the outdir exists or not

partition_filter

Filter the dataset based on folder partitions

play

Play the recorded lazy operations

print.disk.frame

Print disk.frame

pull.disk.frame

Pull a column from table similar to dplyr::pull.

purrr_as_mapper

Used to convert a function to purrr syntax if needed

rbindlist.disk.frame

rbindlist disk.frames together

rechunk

Increase or decrease the number of chunks in the disk.frame

recommend_nchunks

Recommend number of chunks based on input size

remove_chunk

Removes a chunk from the disk.frame

sample

Sample n rows from a disk.frame

setup_disk.frame

Set up disk.frame environment

shard

Shard a data.frame/data.table or disk.frame into chunk and saves it in...

shardkey

Returns the shardkey (not implemented yet)

shardkey_equal

Compare two disk.frame shardkeys

show_ceremony

Show the code to setup disk.frame

split_string_into_df

Turn a string of the form /partion1=val/partion2=val2 into data.frame

srckeep

Keep only the variables from the input listed in selections

sub-sub-.disk.frame

[[ interface for disk.frame using fst backend

tbl_vars.disk.frame

Column names for RStudio auto-complete

write_disk.frame

Write disk.frame to disk

zip_to_disk.frame

zip_to_disk.frame is used to read and convert every CSV file within ...

A disk-based data manipulation tool for working with large-than-RAM datasets. Aims to lower the barrier-to-entry for manipulating large datasets by adhering closely to popular and familiar data manipulation paradigms like 'dplyr' verbs and 'data.table' syntax.

  • Maintainer: Dai ZJ
  • License: MIT + file LICENSE
  • Last published: 2023-08-24