Problems of morphological markup of words in corpus texts, and their inclusion in a computer program
Views: 135 / PDF downloads: 252
DOI:
https://doi.org/10.32523/2616-678X-2022-140-3-103-113Keywords:
corpus, corpus linguistics, text, morphology, conditional marking, markup, computer programAbstract
The article gives a brief overview of the history of corpus creation in linguistics, characteristics of corpus linguistics, theoretical and practical tasks and requirements of morphological markup are indicated.
Morphological markup of words in corpus texts was originally created manually. Explanations of the basic principles of morphological analysis of individual words and markings are given. It is known that morphological analysis is carried out mainly without reference to the context. The article separately highlights various features encountered in the analysis of morphological structures of parts of speech and the placement of morphological markings of words.
Automatic disassembly of the morphological system of the language is carried out by performing several stepwise conditions in the computer memory. These are: 1) identification of the morphological structure of words (single-root word, affixes); 2) entering a list and pre-prepared affixes into the computer's memory; 3) entering electronic format texts of various language styles and containing morphological markings into the computer's memory. Then, with the help of a computer program, the following works are performed: a) marking parts of speech on some words that are not placed; b) in the process of processing registry words, manually correct single errors when placing parts of speech on them; b) leave only one of the homonyms relative to one of the parts of speech in the list of registry words; c) identify differences in word-forming suffixes and formative affixes.