NUSS0.1.0 package

Mixed N-Grams and Unigram Sequence Segmentation

Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.

  • Maintainer: Oskar Kosch
  • License: GPL (>= 3)
  • Last published: 2024-08-19