Problems of morphological markup of words in corpus texts, and their inclusion in a computer program


Views: 112 / PDF downloads: 192

Authors

  • С.К. Kulmanov A. Baitursynuly Institute of Linguistics
  • A. A. Zhanabekova A. Baitursynuly Institute of Linguistics
  • N.M. Ashimbayeva
  • А.Z.-G. Bisengali A.Baitursynuly Institute of Linguistics
  • N.K. Shulenbayev A. Baitursynuly Institute of Linguistics
  • B.K. Kordabay A. Baitursynuly Institute of Linguistics

DOI:

https://doi.org/10.32523/2616-678X-2022-140-3-103-113

Keywords:

corpus, corpus linguistics, text, morphology, conditional marking, markup, computer program

Abstract

The article gives a brief overview of the history of corpus creation in linguistics, characteristics of corpus linguistics, theoretical and practical tasks and requirements of morphological markup are indicated.

Morphological markup of words in corpus texts was originally created manually. Explanations of the basic principles of morphological analysis of individual words and markings are given. It is known that morphological analysis is carried out mainly without reference to the context. The article separately highlights various features encountered in the analysis of morphological structures of parts of speech and the placement of morphological markings of words.

Automatic disassembly of the morphological system of the language is carried out by performing several stepwise conditions in the computer memory. These are: 1) identification of the morphological structure of words (single-root word, affixes); 2) entering a list and pre-prepared affixes into the computer's memory; 3) entering electronic format texts of various language styles and containing morphological markings into the computer's memory. Then, with the help of a computer program, the following works are performed: a) marking parts of speech on some words that are not placed; b) in the process of processing registry words, manually correct single errors when placing parts of speech on them; b) leave only one of the homonyms relative to one of the parts of speech in the list of registry words; c) identify differences in word-forming suffixes and formative affixes.

Author Biographies

С.К. Kulmanov, A. Baitursynuly Institute of Linguistics

– Candidate of Philology, Associate Professor

A. A. Zhanabekova , A. Baitursynuly Institute of Linguistics

– Doctor of Рhilology, Professor

N.M. Ashimbayeva

– Candidate of Рhilological Sciences

А.Z.-G. Bisengali , A.Baitursynuly Institute of Linguistics

– Doctor of Philosophy (PhD)

N.K. Shulenbayev, A. Baitursynuly Institute of Linguistics

– Master of Humanities

B.K. Kordabay , A. Baitursynuly Institute of Linguistics

– Master of Humanities

Published

2022-12-17

How to Cite

Kulmanov С. ., Zhanabekova А. ., Ashimbayeva Н. ., Bisengali А. ., Shulenbayev Н. Қ. ., & Kordabay Б. (2022). Problems of morphological markup of words in corpus texts, and their inclusion in a computer program. Bulletin of L.N. Gumilyov Eurasian National University. PHILOLOGY Series, 140(3), 103–113. https://doi.org/10.32523/2616-678X-2022-140-3-103-113

Issue

Section

Linguistics