Creates a Model Matrix via Feature Hashing with a Formula Interface
CSCMatrix
Extract mapping between hash and original values
Compute minimum hash size to reduce collision rate
Create a model matrix with feature hashing
Convert the integer to raw vector with endian correction
Simulate how split work in hashed.model.matrix to split the string...
Feature hashing, also called as the hashing trick, is a method to transform features of a instance to a vector. Thus, it is a method to transform a real dataset to a matrix. Without looking up the indices in an associative array, it applies a hash function to the features and uses their hash values as indices directly. The method of feature hashing in this package was proposed in Weinberger et al. (2009) <arXiv:0902.2206>. The hashing algorithm is the murmurhash3 from the 'digest' package. Please see the README in <https://github.com/wush978/FeatureHashing> for more information.
Useful links