stochastic.logistic.regression function

Stochastic Logistic Regression

Stochastic Logistic Regression

stochastic.logistic.regression(formula = NULL, pos_class = NULL, dim = NULL, intercept = TRUE, x0 = NULL, optimizer = "adaQN", optimizer_args = list(initial_step = 0.1, verbose = FALSE), lambda = 0.001, random_seed = 1, val_data = NULL)

Arguments

  • formula: Formula for the model, if it is fit to data.frames instead of matrices/vectors.
  • pos_class: If fit to data in the form of data.frames, a string indicating which of the classes is the positive one. If fit to data in the form of matrices/vector, pass NULL.
  • dim: Dimensionality of the model (number of features). Ignored when passing formula or when passing x0. If the intercept is added from the option intercept here, it should not be counted towards dim.
  • intercept: Whether to add an intercept to the covariates. Only ussed when fitting to matrices/vectors. Ignored when passing formula (for formulas without intercept, put -1 in the RHS to get rid of the intercept).
  • x0: Initial values of the variables. If passed, will ignore dim and random_seed. If not passed, will generate random starting values ~ Norm(0, 0.1).
  • optimizer: The optimizer to use - one of adaQN (recommended), SQN, oLBFGS.
  • optimizer_args: Arguments to pass to the optimizer (same ones as the functions of the same name). Must be a list. See the documentation of each optimizer for the parameters they take.
  • lambda: Regularization parameter. Be aware that the functions assume the log-likelihood (a.k.a. loss) is divided by the number of observations, so this number should be small.
  • random_seed: Random seed to use for the initialization of the variables. Ignored when passing x0.
  • val_data: Validation data (only used for adaQN). If passed, must be a list with entries X, y (if passing data.frames for fitting), and optionally w (sample weights).

Returns

An object of class stoch_logistic, which can be fit to batches of data through functon partial_fit_logistic.

Details

Binary logistic regression, fit in batches using this package's own optimizers.

Examples

library(stochQN) ### Load Iris dataset data("iris") ### Example with X + y interface X <- as.matrix(iris[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]) y <- as.numeric(iris$Species == "setosa") ### Initialize model with default parameters model <- stochastic.logistic.regression(dim = 4) ### Fit to 10 randomly-subsampled batches batch_size <- as.integer(nrow(X) / 3) for (i in 1:10) { set.seed(i) batch <- sample(nrow(X), size = batch_size, replace=TRUE) partial_fit_logistic(model, X, y) } ### Check classification accuracy cat(sprintf( "Accuracy after 10 iterations: %.2f%%\n", 100 * mean( predict(model, X, type = "class") == y) )) ### Example with formula interface iris_df <- iris levels(iris_df$Species) <- c("setosa", "other", "other") ### Initialize model with default parameters model <- stochastic.logistic.regression(Species ~ ., pos_class="setosa") ### Fit to 10 randomly-subsampled batches batch_size <- as.integer(nrow(iris_df) / 3) for (i in 1:10) { set.seed(i) batch <- sample(nrow(iris_df), size=batch_size, replace=TRUE) partial_fit_logistic(model, iris_df) } cat(sprintf( "Accuracy after 10 iterations: %.2f%%\n", 100 * mean( predict( model, iris_df, type = "class") == iris_df$Species ) ))

See Also

partial_fit_logistic , coef.stoch_logistic , predict.stoch_logistic , adaQN , SQN , oLBFGS

  • Maintainer: David Cortes
  • License: BSD_2_clause + file LICENSE
  • Last published: 2021-09-26