pre_tokenizer_whitespace function

This pre-tokenizer simply splits using the following regex: \w+|[^\w\s]+

  • Maintainer: Daniel Falbel
  • License: MIT + file LICENSE
  • Last published: 2025-09-30