sketch function

Sketch

Sketch

Provides a subsample of data using sketches

sketch(data, m, method = "unif")

Arguments

  • data: (n times d)-dimensional matrix of data.
  • m: (expected) subsample size that is less than n
  • method: method for sketching: "unif" uniform sampling with replacement (default); "unif_without_replacement" uniform sampling without replacement; "bernoulli" Bernoulli sampling; "gaussian" Gaussian projection; "countsketch" CountSketch; "srht" subsampled randomized Hadamard transform; "fft" subsampled randomized trigonometric transforms using the real part of fast discrete Fourier transform (stats::ftt).

Returns

(m times d)-dimensional matrix of data For Bernoulli sampling, the number of rows is not necessarily m.

Examples

## Least squares: sketch and solve # setup n <- 1e+6 # full sample size d <- 5 # dimension of covariates m <- 1e+3 # sketch size # generate psuedo-data X <- matrix(stats::rnorm(n*d), nrow = n, ncol = d) beta <- matrix(rep(1,d), nrow = d, ncol = 1) eps <- matrix(stats::rnorm(n), nrow = n, ncol = 1) Y <- X %*% beta + eps intercept <- matrix(rep(1,n), nrow = n, ncol = 1) # full sample including the intercept term fullsample <- cbind(Y,intercept,X) # generate a sketch using CountSketch s_cs <- sketch(fullsample, m, "countsketch") # solve without the intercept ls_cs <- lm(s_cs[,1] ~ s_cs[,2] - 1) # generate a sketch using SRHT s_srht <- sketch(fullsample, m, "srht") # solve without the intercept ls_srht <- lm(s_srht[,1] ~ s_srht[,2] - 1)
  • Maintainer: Sokbae Lee
  • License: GPL-3
  • Last published: 2022-09-07