Home
Packages
Datasets
Task Views
R resources
Packages
Toggle theme
Toggle Menu
Home
Packages
tok
pre_tokenizer_whitespace
pre_tokenizer_whitespace function
This pre-tokenizer simply splits using the following regex:
\w+|[^\w\s]+
Copy
tok package
Read PDF manual
Maintainer: Daniel Falbel
License: MIT + file LICENSE
Last published: 2025-09-30
Useful links
https://github.com/mlverse/tok/issues
https://github.com/mlverse/tok