Compute the matrix rank, a well-defined functional in theory(*), somewhat ambiguous in practice. We provide several methods, the default corresponding to Matlab's definition.
(*) The rank of a nxm matrix A, rk(A), is the maximal number of linearly independent columns (or rows); hence rk(A)<=min(n,m).
rankMatrix(x, tol =NULL, method = c("tolNorm2","qr.R","qrLINPACK","qr","useGrad","maybeGrad"), sval = svd(x,0,0)$d, warn.t =TRUE, warn.qr =TRUE)qr2rankMatrix(qr, tol =NULL, isBqr = is.qr(qr), do.warn =TRUE)
Arguments
x: numeric matrix, of dimension nxm, say.
tol: nonnegative number specifying a (relative, scalefree ) tolerance for testing of practically zero with specific meaning depending on method; by default, max(dim(x)) * .Machine$double.eps
is according to Matlab's default (for its only method which is our method="tolNorm2").
method: a character string specifying the computational method for the rank, can be abbreviated:
"tolNorm2":: the number of singular values >= tol * max(sval);
"qrLINPACK":: for a dense matrix, this is the rank of qr(x, tol, LAPACK=FALSE) (which is qr(...)$rank);
This ("qr*", dense) version used to be **the** recommended way to compute a matrix rank for a while in the past.
For sparse `x`, this is equivalent to `"qr.R"`.
"qr.R":: this is the rank of triangular matrix R, where qr() uses LAPACK or a "sparseQR" method (see qr-methods) to compute the decomposition QR. The rank of R is then defined as the number of non-zero diagonal entries di of R, and non-zero s fulfill ∣di∣>=tol∗max(∣di∣).
"qr":: is for back compatibility; for dense x, it corresponds to "qrLINPACK", whereas for sparse x, it uses "qr.R".
For all the "qr*" methods, singular values `sval` are not used, which may be crucially important for a large sparse matrix `x`, as in that case, when `sval` is not specified, the default, computing `svd()` currently coerces `x` to a dense matrix.
"useGrad":: considering the gradient of the (decreasing) singular values, the index of the smallest gap.
"maybeGrad":: choosing method "useGrad" only when that seems reasonable; otherwise using "tolNorm2".
sval: numeric vector of non-increasing singular values of x; typically unspecified and computed from x when needed, i.e., unless method = "qr".
warn.t: logical indicating if rankMatrix() should warn when it needs t(x) instead of x. Currently, for method = "qr" only, gives a warning by default because the caller often could have passed t(x) directly, more efficiently.
warn.qr: in the QR cases (i.e., if method starts with "qr"), rankMatrix() calls qr2rankMarix(.., do.warn = warn.qr), see below.
qr: an object resulting from qr(x,..), i.e., typically inheriting from class"qr" or "sparseQR".
isBqr: logical indicating if qr is resulting from baseqr(). (Otherwise, it is typically from Matrix package sparse qr.)
do.warn: logical; if true, warn about non-finite diagonal entries in the R matrix of the QR decomposition. Do not change lightly!
Details
qr2rankMatrix() is typically called from rankMatrix() for the "qr"* methods, but can be used directly - much more efficiently in case the qr-decomposition is available anyway.
Note
For large sparse matrices x, unless you can specify sval yourself, currently method = "qr" may be the only feasible one, as the others need sval and call svd() which currently coerces x to a denseMatrix which may be very slow or impossible, depending on the matrix dimensions.
Note that in the case of sparse x, method = "qr", all non-strictly zero diagonal entries di where counted, up to including Matrix version 1.1-0, i.e., that method implicitly used tol = 0, see also the set.seed(42) example below.
Returns
If x is a matrix of all 0 (or of zero dimension), the rank is zero; otherwise, typically a positive integer in 1:min(dim(x))
with attributes detailing the method used.
There are rare cases where the sparse QR decomposition fails in so far as the diagonal entries of R, the di (see above), end with non-finite, typically NaN
entries. Then, a warning is signalled (unless warn.qr / do.warn is not true) and NA (specifically, NA_integer_) is returned.
Author(s)
Martin Maechler; for the "*Grad" methods building on suggestions by Ravi Varadhan.
See Also
qr, svd.
Examples
rankMatrix(cbind(1,0,1:3))# 2(meths <- eval(formals(rankMatrix)$method))## a "border" case:H12 <- Hilbert(12)rankMatrix(H12, tol =1e-20)# 12; but 11 with default method & tol.sapply(meths,function(.m.) rankMatrix(H12, method = .m.))## tolNorm2 qr.R qrLINPACK qr useGrad maybeGrad## 11 11 12 12 11 11## The meaning of 'tol' for method="qrLINPACK" and *dense* x is not entirely "scale free"rMQL <-function(ex, M) rankMatrix(M, method="qrLINPACK",tol =10^-ex)rMQR <-function(ex, M) rankMatrix(M, method="qr.R", tol =10^-ex)sapply(5:15, rMQL, M = H12)# result is platform dependent## 7 7 8 10 10 11 11 11 12 12 12 {x86_64}sapply(5:15, rMQL, M =1000* H12)# not identical unfortunately## 7 7 8 10 11 11 12 12 12 12 12sapply(5:15, rMQR, M = H12)## 5 6 7 8 8 9 9 10 10 11 11sapply(5:15, rMQR, M =1000* H12)# the *same*## "sparse" case:M15 <- kronecker(diag(x=c(100,1,10)), Hilbert(5))sapply(meths,function(.m.) rankMatrix(M15, method = .m.))#--> all 15, but 'useGrad' has 14.sapply(meths,function(.m.) rankMatrix(M15, method = .m., tol =1e-7))# all 14## "large" sparsen <-250000; p <-33; nnz <-10000L <- sparseMatrix(i = sample.int(n, nnz, replace=TRUE), j = sample.int(p, nnz, replace=TRUE), x = rnorm(nnz))(st1 <- system.time(r1 <- rankMatrix(L)))# warning+ ~1.5 sec (2013)(st2 <- system.time(r2 <- rankMatrix(L, method ="qr")))# considerably faster!r1[[1]]== print(r2[[1]])## --> ( 33 TRUE )## another sparse-"qr" one, which ``failed'' till 2013-11-23:set.seed(42)f1 <- factor(sample(50,1000, replace=TRUE))f2 <- factor(sample(50,1000, replace=TRUE))f3 <- factor(sample(50,1000, replace=TRUE))D <- t(do.call(rbind, lapply(list(f1,f2,f3), as,'sparseMatrix')))dim(D); nnzero(D)## 1000 x 150 // 3000 non-zeros (= 2%)stopifnot(rankMatrix(D, method='qr')==148, rankMatrix(crossprod(D),method='qr')==148)## zero matrix has rank 0 :stopifnot(sapply(meths,function(.m.) rankMatrix(matrix(0,2,2), method = .m.))==0)