chinese.misc0.2.3 package

Miscellaneous Tools for Chinese Text Mining and More

seg_file

Convenient Tool to Segment Chinese Texts

slim_text

Remove Words through Speech Tagging

as.character2

An Enhanced Version of as.character

as.numeric2

An Enhanced Version of as.numeric

chinese.misc-package

Miscellaneous Tools for Chinese Text Mining and More

corp_or_dtm

Create Corpus or Document Term Matrix with 1 Line

create_ttm

Create Term-Term Matrix (Term-Cooccurrence Matrix)

csv2txt

Write Texts in CSV into Many TXT/RTF Files

DEFAULT_control1

A Default Value for corp_or_dtm 1

DEFAULT_control2

A Default Value for corp_or_dtm 2

DEFAULT_cutter

A Default Cutter

dictionary_dtm

Making DTM/TDM for Groups of Words

dir_or_file

Collect Full Filenames from a Mix of Directories and Files

get_tag_word

Extract Words of Some Certain Tags through Pos-Tagging

get_tmp_chi_locale

Check The Locale Functions are to Assume

is_character_vector

A Convenient Version of is.character

is_positive_integer

A Convenient Version of is.integer

m2doc

Rewrite Terms and Frequencies into Many Files

m3m

Convert Objects among matrix, dgCMatrix, simple_triplet_matrix, Docume...

make_stoplist

Input a Filename and Return a Vector of Stop Words

match_pattern

Extract Strings by Regular Expression Quickly

output_dtm

Convert or Write DTM/TDM Object Quickly

scancn

Read a Text File by Auto-Detecting Encoding

sort_tf

Find High Frequency Terms

sparse_left

Check How many Words are Left under Certain Sparse Values

tf2doc

Transform Terms and Frequencies into a Text

topic_trend

Simple Rise or Fall Trend of Several Years

txt2csv

Write Many Separated Files into a CSV

V

Copy and Paste from Excel-Like Files

VC

Copy and Paste from Excel-Like Files

VCR

Copy and Paste from Excel-Like Files

VR

Copy and Paste from Excel-Like Files

VRC

Copy and Paste from Excel-Like Files

word_cor

Word Correlation in DTM/TDM

Efforts are made to make Chinese text mining easier, faster, and robust to errors. Document term matrix can be generated by only one line of code; detecting encoding, segmenting and removing stop words are done automatically. Some convenient tools are also supplied.