tokenize function

Recompute the tokens for a document or corpus