Korpus Polsko-Rosyjski Uniwersytetu Warszawskiego
Abstract
The Polish-Russian Parallel Corpus has been developed at the University of Warsaw (the Faculty of Polish Studies and the Institute of Russian Studies) in co-operation with the National Corpus of Polish and the Russian National Corpus. The corpus consists of Russian and Polish literary classics (90%), nonfiction books, legal texts (5%), religious texts (i.e. Bible translations; 4%) and contemporary press articles (1%). Great Russian realistic novels of the 19th century, together with modern Russian books which are most popular in Poland, made up a significant part of the corpus. We have also taken into account these works of Polish literature that are the most widely known in Russia. Looking for loci communes in the Russian and Polish cultures was an important, extra-linguistic aspect of the corpus project. Unfortunately, the novels by Dostoevsky or Tolstoy were translated into Polish only in the thirties and the copyright protection for the translations – 70 years since the author’s death – is still in force. Some of the translators’ heirs did not grant their permission to include the texts in the corpus. The annotation and search possibilities in the corpus result from co-operation with the national corpora. However, not all levels of annotation applied in the source corpora will be used in the parallel corpus. Two national corpora differ according to grammatical disambiguation of annotated word forms. In the National Corpus of Polish all texts are disambiguated, in the Russian National Corpus only some them have undergone this procedure. The search interface is based on the user-friendly interface of the Russian National Corpus. It allows formulating lexical and grammatical queries using the tags present in the tag sets of the two national corpora and is easy for users of both national corpora. In the second part of the paper some practical applications of the corpus in the linguistic research, translation practice and foreign language teaching have been shown. The first case is a Russian translation strategy of the Polish unspecified numeral kilkanaście ‘over a dozen’, the second – a Polish translation strategy of the Russian adjectives russkij and rossijskij ‘Russian’.
Description
Gruszczyńska, Ewa; Leńko-Szymańska, Agnieszka, red. (2016). Polskojęzyczne korpusy równoległe. Polish-language Parallel Corpora. Warszawa: Instytut Lingwistyki Stosowanej, pp. 84-95.