Creates a dataset that help batching long-running read and writes
Creates a dataset that help batching long-running read and writes
The function returns a base::data.frame() that other functions use to separate long-running read and write REDCap calls into multiple, smaller REDCap calls. The goal is to (1) reduce the chance of time-outs, and (2) introduce little breaks between batches so that the server isn't continually tied up.
create_batch_glossary(row_count, batch_size)
Arguments
row_count: The number records in the large dataset, before it's split.
batch_size: The maximum number of subject records a single batch should contain.
Returns
Currently, a base::data.frame() is returned with the following columns,
id: an integer that uniquely identifies the batch, starting at 1.
start_index: the index of the first row in the batch. integer.
stop_index: the index of the last row in the batch. integer.
id_pretty: a character representation of id, but padded with zeros.
start_index: a character representation of start_index, but padded with zeros.
stop_index: a character representation of stop_index, but padded with zeros.
label: a character concatenation of id_pretty, start_index, and stop_index_pretty.
Details
This function can also assist splitting and saving a large data frame to disk as smaller files (such as a .csv). The padded columns allow the OS to sort the batches/files in sequential order.
Examples
REDCapR::create_batch_glossary(100,50)REDCapR::create_batch_glossary(100,25)REDCapR::create_batch_glossary(100,3)REDCapR::create_batch_glossary(0,3)d <- data.frame( record_id =1:100, iv = sample(x=4, size=100, replace=TRUE), dv = rnorm(n=100))REDCapR::create_batch_glossary(nrow(d), batch_size=40)
See Also
See redcap_read() for a function that uses create_batch_glossary.