data: (n times d)-dimensional matrix of data. The first column needs to be a vector of the dependent variable (Y)
m: subsample size that is less than n
method: method for sketching: "leverage" leverage score sampling using X (default); "root_leverage" square-root leverage score sampling using X.
Returns
An S3 object has the following elements. - subsample: (m times d)-dimensional matrix of data
prob: m-dimensional vector of probabilities
Examples
## Least squares: sketch and solve# setupn <-1e+6# full sample sized <-5# dimension of covariatesm <-1e+3# sketch size# generate psuedo-dataX <- matrix(stats::rnorm(n*d), nrow = n, ncol = d)beta <- matrix(rep(1,d), nrow = d, ncol =1)eps <- matrix(stats::rnorm(n), nrow = n, ncol =1)Y <- X %*% beta + eps
intercept <- matrix(rep(1,n), nrow = n, ncol =1)# full sample including the intercept termfullsample <- cbind(Y,intercept,X)# generate a sketch using leverage score samplings_lev <- sketch_leverage(fullsample, m,"leverage")# solve without the intercept with weightingls_lev <- lm(s_lev$subsample[,1]~ s_lev$subsample[,2]-1, weights = s_lev$prob)
References
Ma, P., Zhang, X., Xing, X., Ma, J. and Mahoney, M.. (2020). Asymptotic Analysis of Sampling Estimators for Randomized Numerical Linear Algebra Algorithms. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:1026-1035.