A Text Mining Toolkit for Chinese
Print the UTF-8 codes of a string.
Create a Chinese term-document matrix or a document-term matrix.
Create a word frequency data.frame.
Get the current encoding of the locale.
Indicate whether the encoding of input string is BIG5.
Indicate whether the encoding of input string is GB18030.
Indicate whether the encoding of input string is GB2312.
Indicate whether the encoding of input string is GBK.
Indicate whether the encoding of input string is UTF-8.
Extract the left or right substrings in a character vector.
Revert UTF-8 string to Chinese character.
Set locale to Simplified Chinese/Traditional Chinese/UK.
Return Chinese stop words.
Mixed case capitalizing.
Extract matched substrings by regular expression.
Pad a string to a specified length with a padding character.
Trim space of a string.
Convert a chinese text to pinyin format.
Convert a Chinese text from simplified to traditional characters and v...
Convert encoding of Chinese string to UTF-8.
A Text mining toolkit for Chinese, which includes facilities for Chinese string processing, Chinese NLP supporting, encoding detecting and converting. Moreover, it provides some functions to support 'tm' package in Chinese.