tokenizers function

Split texts into tokens