parse.corpus function

Perform pre-processing (tokenization, n-gram extracting, etc.)