gibasa R package [Documentation]

bind_lr

Bind importance of bigrams

as_tokens

Create a list of tokens

bind_tf_idf2

Bind term frequency and inverse document frequency

build_sys_dic

Build system dictionary

build_user_dic

Build user dictionary

collapse_tokens

Collapse sequences of tokens by condition

dict_index_sys

Build system dictionary

dict_index_user

Build user dictionary

dictionary_info

Get dictionary information

gbs_tokenize

Tokenize sentences using 'MeCab'

get_dict_features

Get dictionary features

get_transition_cost

Get transition cost between pos attributes

gibasa-package

gibasa: An Alternative 'Rcpp' Wrapper of 'MeCab'

is_blank

Check if scalars are blank

lex_density

Calculate lexical density

mute_tokens

Mute tokens by condition

ngram_tokenizer

Ngrams tokenizer

pack

Pack a data.frame of tokens

posDebugRcpp

Tokenizer for debug use

posParallelRcpp

Call tagger inside 'RcppParallel::parallelFor' and return a data.frame...

prettify

Prettify tokenized output

tokenize

Tokenize sentences using 'MeCab'

transition_cost

Get transition cost between pos attributes

Download source package Read PDF manual

A plain 'Rcpp' wrapper for 'MeCab' that can segment Chinese, Japanese, and Korean text into tokens. The main goal of this package is to provide an alternative to 'tidytext' using morphological analysis.

Maintainer: Akiru Kato
License: GPL (>= 3)
Last published: 2025-02-16

Useful links

gibasa1.1.2 package

Functions

Readme

Datasets

Dependencies

Imports

Versions

News