Mixed N-Grams and Unigram Sequence Segmentation
Perform inverse regex search (C++)
Create n-grams dictionary
Segmenting sequences with n-grams.
Mixed N-Grams and Unigram Sequence Segmentation (NUSS) function
Create unigram dictionary
Segmenting sequences with unigrams
Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.