Breaks a string of concatenated words into individual words
Breaks a string of concatenated words into individual words
This function inserts spaces into a string of words lacking spaces, like a hashtag or part of a URL. Punctuation or exotic characters can prevent a string from being broken, so it's best to limit input strings to lower-case, alpha-numeric characters. The input string must be in ASCII format.
textToBreak: (character) Line of text to break into words. If spaces are present, they will be interpreted as hard breaks and maintained, except for leading or trailing spaces, which will be trimmed. Must be in ASCII format.
modelToUse: (character) Which language model to use, supported values: "title", "anchor", "query", or "body" (optional, default: "body")
orderOfNgram: (integer) Which order of N-gram to use, supported values: 1L, 2L, 3L, 4L, or 5L (optional, default: 5L)
maxNumOfCandidatesReturned: (integer) Maximum number of candidates to return (optional, default: 5L)
Returns
An S3 object of the class weblm. The results are stored in the results dataframe inside this object. The dataframe contains the candidate breakdowns and their log(probability).
Examples
## Not run: tryCatch({# Break a sentence into words textWords <- weblmBreakIntoWords( textToBreak ="testforwordbreak",# ASCII only modelToUse ="body",# "title"|"anchor"|"query"(default)|"body" orderOfNgram =5L,# 1L|2L|3L|4L|5L(default) maxNumOfCandidatesReturned =5L# Default: 5L)# Class and structure of textWords class(textWords)#> [1] "weblm" str(textWords, max.level =1)#> List of 3#> $ results:'data.frame': 5 obs. of 2 variables:#> $ json : chr "{"candidates":[{"words":"test for word break", __truncated__ }]}#> $ request:List of 7#> ..- attr(*, "class")= chr "request"#> - attr(*, "class")= chr "weblm"# Print results pandoc.table(textWords$results)#> ---------------------------------#> words probability#> ------------------- -------------#> test for word break -13.83#>#> test for wordbreak -14.63#>#> testfor word break -15.94#>#> test forword break -16.72#>#> testfor wordbreak -17.41#> ---------------------------------}, error =function(err){# Print error geterrmessage()})## End(Not run)