Polsko-Ukraiński Korpus Równoległy PolUKR i jego następca PolUKR-2

Full item record

dc.contributor.authorKotsyba, Natalia
dc.contributor.organizationPolska Akademia Naukpl
dc.descriptionGruszczyńska, Ewa; Leńko-Szymańska, Agnieszka, red. (2016). Polskojęzyczne korpusy równoległe. Polish-language Parallel Corpora. Warszawa: Instytut Lingwistyki Stosowanej, pp. 133-142.en
dc.description.abstractThe paper discusses the present stage of development of one of the aspects of an ongoing project aiming at creating electronic resources for the Ukrainian language. Parallel corpora make an important part of this project. The Polish-Ukrainian Parallel Corpus (PolUKR) was developed in 2004-2010, first in the Institute of Slavic Studies of the Polish Academy of Sciences, later at the faculty “Artes Liberales” of the University of Warsaw. The first two versions of PolUKR are available for search online at http://domeczek.pl/~polukr. PolUKR consists of texts written originally either in Polish or Ukrainian, i.e., it does not contain any texts translated from a third language, but only immediate translations of its own texts. It had been aligned at the level of sentences automatically, afterwards the alignments were edited manually. Both the Polish and Ukrainian sentences had been supplied with the morphosyntactic layer of annotation. The characteristic feature of PolUKR is its purpose-built morphosyntactic categorical apparatus, common for the two corpus languages, and its morphosyntactic tagsets based on it. The tagsets are also used in the multilingual European project MULTEXT-East (1996-2010), version 4 “MONDILEX”, available at http://nl.ijs.si/ME/V4/. While the pilot versions of PolUKR concentrated rather on developing corpus-making technologies, in both their technical and theoretical linguistic aspects, the new version, presently developed in cooperation with the National University of Lviv and Lviv Polytechnical University in Ukraine, aims at: 1) first of all, extending the size of the corpus up to 30 million words (as previously, with the biggest possible attention to original Polish or Ukrainian texts, but without a strict limitation on this feature); 2) optimalization of the morphosyntactic description for the Ukrainian language, i.e., disambiguation of ambiguous interpretations and extension of the grammatical dictionary for new, unknown words. Work on the shallow syntax for Ukrainian is also planned. PolUKR-2 will be used as a basic corpus resource for creating a great Ukrainian-Polish dictionary with ca. 80 thousand entries.en
dc.publisherInstytut Lingwistyki Stosowanej UWpl
dc.rightsDozwolony użytek*
dc.subjectkorpus równoległypl
dc.subjectjęzyk polskipl
dc.subjectjęzyk ukraińskipl
dc.subjecttagset morfoskładniowypl
dc.subjectparallel corpusen
dc.subjectmorphosyntactic tagseten
dc.titlePolsko-Ukraiński Korpus Równoległy PolUKR i jego następca PolUKR-2pl
dc.title.alternativePolish-Ukrainian Parallel Corpus PolUKR and its successor PolUKR-2en
Files for this record
Original bundle
Now showing 1 - 1 of 1
Name: 08_Kotsyba.pdf
Size: 1.01 MB
Format: Adobe Portable Document Format
License files
Name: license.txt
Size: 228 B
Format: Item-specific license agreed upon to submission
Belongs to collection