Japanese Text Processing Tools
Prettify tokenized output
Read a rewrite.def file
Fill Japanese iteration marks
Hiraganize Japanese characters
Katakanize Japanese characters
Convert text following the rules of 'NEologd'
Rewrite text using rewrite.def
Romanize Japanese Hiragana and Katakana
Segment text into tokens
Segment text into phrases
Pack a data.frame of tokens
Bind importance of bigrams
Bind term frequency and inverse document frequency
Collapse sequences of tokens by condition
Get dictionary's features
Calculate lexical density
Mute tokens by condition
Ngrams tokenizer
Split text into tokens
Transcribe Arabic to Kansuji
audubon: Japanese Text Processing Tools
A collection of Japanese text processing tools for filling Japanese iteration marks, Japanese character type conversions, segmentation by phrase, and text normalization which is based on rules for the 'Sudachi' morphological analyzer and the 'NEologd' (Neologism dictionary for 'MeCab'). These features are specific to Japanese and are not implemented in 'ICU' (International Components for Unicode).
Useful links