Generalized Linear Models (GLM) for Large Data Sets
converts numeric vector to integer
Function to carry out generalized linear regression on a data_frame da...
predict function for bglm object
binomial family function
Function to carry out linear regression on a data_frame data object
creates factor from numeric vector and character vector as levels
function to create a data_frame object
function to create a data_frame object
Function for creating control parameters for the GLM fit
family function
Gamma family function
gaussian family function
inverse.gaussian family function
function to load data_frame object
function to load data_frame object
finds whether x is in y
mySeq function to sequence integers
poisson family function
print function for the bglm object
print function for the blm object
print function for a data_frame
print function for a data_matrix
Function to print the summary object from the bglm object
Function to print the summary object from the blm object
Function to print the summary object from the blm object
quasi family function
quasibinomial family function
quasipoisson family function
row binding for benchmarking ...
read data frame block from file
read multiple blocks of data frames from file
read matrix block from file
read matrix blocks from file
reads numeric vector to file
The reduction function for the algorithm
summary function for the bglm object
summary function for the blm object
Singular value decomposition of the aggregated list from XWXMatrix(W) ...
writes numeric vector to file
writes numeric vector to file
Calculation of iterative regression components
Calculation of iterative regression components
Allows the user to carry out GLM on very large data sets. Data can be created using the data_frame() function and appended to the object with object$append(data); data_frame and data_matrix objects are available that allow the user to store large data on disk. The data is stored as doubles in binary format and any character columns are transformed to factors and then stored as numeric (binary) data while a look-up table is stored in a separate .meta_data file in the same folder. The data is stored in blocks and GLM regression algorithm is modified and carries out a MapReduce- like algorithm to fit the model. The functions bglm(), and summary() and bglm_predict() are available for creating and post-processing of models. The library requires Armadillo installed on your system. It may not function on windows since multi-core processing is done using mclapply() which forks R on Unix/Linux type operating systems.