NUSS R package [Documentation]

igrepl

Perform inverse regex search (C++)

ngrams_dictionary

Create n-grams dictionary

ngrams_segmentation

Segmenting sequences with n-grams.

nuss

Mixed N-Grams and Unigram Sequence Segmentation (NUSS) function

unigram_dictionary

Create unigram dictionary

unigram_sequence_segmentation

Segmenting sequences with unigrams

Download source package Read PDF manual

Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.

Maintainer: Oskar Kosch
License: GPL (>= 3)
Last published: 2024-08-19

Useful links

https://github.com/theogrost/NUSS/issues
https://github.com/theogrost/NUSS

NUSS0.1.0 package

Functions

Readme

Datasets

Dependencies

Imports

Versions