tmcn R package [Documentation]

catUTF8

Print the UTF-8 codes of a string.

createDTM

Create a Chinese term-document matrix or a document-term matrix.

createWordFreq

Create a word frequency data.frame.

getCharset

Get the current encoding of the locale.

isBIG5

Indicate whether the encoding of input string is BIG5.

isGB18030

Indicate whether the encoding of input string is GB18030.

isGB2312

Indicate whether the encoding of input string is GB2312.

isGBK

Indicate whether the encoding of input string is GBK.

isUTF8

Indicate whether the encoding of input string is UTF-8.

left

Extract the left or right substrings in a character vector.

revUTF8

Revert UTF-8 string to Chinese character.

setchs

Set locale to Simplified Chinese/Traditional Chinese/UK.

stopwordsCN

Return Chinese stop words.

strcap

Mixed case capitalizing.

strextract

Extract matched substrings by regular expression.

strpad

Pad a string to a specified length with a padding character.

strstrip

Trim space of a string.

toPinyin

Convert a chinese text to pinyin format.

toTrad

Convert a Chinese text from simplified to traditional characters and v...

toUTF8

Convert encoding of Chinese string to UTF-8.

Download source package Read PDF manual

A Text mining toolkit for Chinese, which includes facilities for Chinese string processing, Chinese NLP supporting, encoding detecting and converting. Moreover, it provides some functions to support 'tm' package in Chinese.

Maintainer: Jian Li
License: LGPL
Last published: 2019-08-08

tmcn0.2-13 package

Functions

Datasets

Dependencies

Versions