Importing Interlinearized Corpora and Dictionaries as Produced by Descriptive Linguistics Software
Importing interlinearized corpora and dictionaries as produced by desc...
Information about the structure of the LIFT XML format in order to eas...
List of the available pieces of information for each entry (ie column ...
Read an EMELD XML document containing an interlinearized corpus.
Parse a dictionary in XML LIFT (Lexicon Interchange FormaT) vocabulary...
Read a file in the format used in the pangloss collection
Parse a Toolbox (SIL) text file
Interlinearized glossed texts (IGT) are used in descriptive linguistics for representing a morphological analysis of a text through a morpheme-by-morpheme gloss. 'InterlineaR' provide a set of functions that targets several popular formats of IGT ('SIL Toolbox', 'EMELD XML') and that turns an IGT into a set of data frames following a relational model (the tables represent the different linguistic units: texts, sentences, word, morphems). The same pieces of software ('SIL FLEX', 'SIL Toolbox') typically produce dictionaries of the morphemes used in the glosses. 'InterlineaR' provide a function for turning the LIFT XML dictionary format into a set of data frames following a relational model in order to represent the dictionary entries, the sense(s) attached to the entries, the example(s) attached to senses, etc.