A fast and flexible replacement for dist, to compute euclidean distances.
fdist(x, v =NULL,..., method ="euclidean", nthreads = .op[["nthreads"]])
Arguments
x: a numeric vector or matrix. Data frames/lists can be passed but will be converted to matrix using qM. Non-numeric (double) inputs will be coerced.
v: an (optional) numeric (double) vector such that length(v) == NCOL(x), to compute distances with (the rows of) x. Other vector types will be coerced.
...: not used. A placeholder for possible future arguments.
method: an integer or character string indicating the method of computing distances.
Int.
String
Description
1
"euclidean"
euclidean distance
2
"euclidean_squared"
squared euclidean distance (more efficient)
nthreads: integer. The number of threads to use. If v = NULL (full distance matrix), multithreading is along the distance matrix columns (decreasing thread loads as matrix is lower triangular). If v is supplied, multithreading is at the sub-column level (across elements).
Returns
If v = NULL, a full lower-triangular distance matrix between the rows of x is computed and returned as a 'dist' object (all methods apply, see dist). Otherwise, a numeric vector of distances of each row of x with v is returned. See Examples.
Note
fdist does not check for missing values, so NA's will result in NA distances.
kit::topn is a suitable complimentary function to find nearest neighbors. It is very efficient and skips missing values by default.
See Also
flm, Fast Statistical Functions , Collapse Overview
Examples
# Distance matrixm = as.matrix(mtcars)str(fdist(m))# Same as dist(m)# Distance with vectord = fdist(m, fmean(m))kit::topn(d,5)# Index of 5 nearest neighbours# Mahalanobis distancem_mahal = t(forwardsolve(t(chol(cov(m))), t(m)))fdist(m_mahal, fmean(m_mahal))sqrt(unattrib(mahalanobis(m, fmean(m), cov(m))))# Distance of two vectorsx <- rnorm(1e6)y <- rnorm(1e6)microbenchmark::microbenchmark( fdist(x, y), fdist(x, y, nthreads =2), sqrt(sum((x-y)^2)))