Problems of morphological markup of words in corpus texts, and their inclusion in a computer program
Views: 378 / PDF downloads: 453
DOI:
https://doi.org/10.32523/2616-678X-2022-140-3-103-113Keywords:
corpus, corpus linguistics, text, morphology, conditional marking, markup, computer programAbstract
The article gives a brief overview of the history of corpus creation in linguistics, characteristics of corpus linguistics, theoretical and practical tasks and requirements of morphological markup are indicated.
Morphological markup of words in corpus texts was originally created manually. Explanations of the basic principles of morphological analysis of individual words and markings are given. It is known that morphological analysis is carried out mainly without reference to the context. The article separately highlights various features encountered in the analysis of morphological structures of parts of speech and the placement of morphological markings of words.
Automatic disassembly of the morphological system of the language is carried out by performing several stepwise conditions in the computer memory. These are: 1) identification of the morphological structure of words (single-root word, affixes); 2) entering a list and pre-prepared affixes into the computer's memory; 3) entering electronic format texts of various language styles and containing morphological markings into the computer's memory. Then, with the help of a computer program, the following works are performed: a) marking parts of speech on some words that are not placed; b) in the process of processing registry words, manually correct single errors when placing parts of speech on them; b) leave only one of the homonyms relative to one of the parts of speech in the list of registry words; c) identify differences in word-forming suffixes and formative affixes.
Downloads
Published
How to Cite
Issue
Section
License
Here is the academic English version suitable for publication on the journal website:
The academic journal “Bulletin of L.N. Gumilyov Eurasian National University. Philology Series” adheres to an Open Access policy for all published materials, based on the principle of free and equitable dissemination of scholarly knowledge. The Editorial Board believes that open access to research results contributes to the advancement of philological science, strengthens academic communication, and promotes the integration of national research into the international scientific community.
1. Free and Open Access
All articles published in the journal are made openly available on the official website of the journal and are accessible to all users without restrictions, registration, or payment.
Users are entitled to:
-
freely read and download materials;
-
copy and distribute the texts of publications;
-
print articles;
-
use materials for scientific and educational purposes, provided that proper attribution is given to the author(s) and the original source of publication.
2. Licensing
Journal materials are distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license:
https://creativecommons.org/licenses/by-nc/4.0/
This license permits the use, copying, distribution, and adaptation of the materials for non-commercial purposes, provided that appropriate credit is given to the author(s) and a link to the original publication is included.
3. Benefits of Open Access
The Open Access policy ensures:
-
increased visibility and citation of scholarly publications;
-
prompt dissemination of research findings in the fields of philology, linguistics, literary studies, and translation studies;
-
expansion of international academic cooperation;
-
access for readers to up-to-date scientific information without financial or technical barriers.
The Editorial Board is committed to ensuring transparency in editorial processes, maintaining high standards of peer review, and providing broad accessibility to research outcomes in the field of philological studies.





