Polsko-litewskie korpusy równoległe. Elementy anotacji semantycznej z zakresu modalności możliwościowej i kwantyfikacji zakresowej

The authors present two Polish-Lithuanian parallel corpora: (1) experimental EKorpPL-LT and (2) KorpPL-LT_CLARIN. EKorpPL-LT is the first extended bilingual Polish-Lithuanian corpus whose resources have been divided into two subcorpora: parallel and comparable. The parallel subcorpus is widely applied in contrastive studies carried out at the Institute of Slavic Studies, Polish Academy of Sciences by the Corpus Linguistics and Semantics Team. Parallel EKorpPL-LT contains various texts being mutual translations between these two languages. KorpPL-LT_CLARIN is based on vast fragments of translations of fiction writings and specialist texts. It is created within the framework of the Polish scientific consortium being a section of the pan-European research infrastructure called CLARIN. For both corpora, basic applications established by their authors are presented. Next, the authors portray the archaic nature of the Lithuanian language, which is of benefit to the structure of multilingual corpora. For this purpose the basic assumptions of semantic categories such as (a) definiteness/indefiniteness, (b) modality (b1) hypothetical and (b2) imperceptive are described. Next, under the distinguished categories and on the basis of the Lithuanian language distinctive features, the possibility to extend the description of the Polish corpora resources is discussed. The authors present some examples of a new semantic annotation (developed by Violetta Koseska and Roman Roszko – for scope quantification and Danuta Roszko and Roman Roszko – for modality). The authors distinguish the following three semantic units: • A neutral degree (I1) and an enhanced degree (I2) of imperceptiveness, • A degree of the lowest probability (H1), particular degrees of growing probability (H2–H5) and a degree of the highest probability (H6) of hypothetical modality, • Uniqueness, existentiality (E1), real existentiality, habitual universality and real universality (categories of scope quantification). The authors assume that the conservative nature of the Lithuanian language, manifesting itself in (i) the stability of forms, (ii) relations between the form and its function, (iii) narrowed specialization of forms, much more advanced than in the Polish language, not only allows to extend the description of the resources, but also considerably affects the development of linguistics and all applied sciences based on language (such as the process of teaching the language, traditional and machine translation etc.).
Gruszczyńska, Ewa; Leńko-Szymańska, Agnieszka, red. (2016). Polskojęzyczne korpusy równoległe. Polish-language Parallel Corpora. Warszawa: Instytut Lingwistyki Stosowanej, pp. 120-132.
