ICM Conference Papers

Recent Submissions

Displaying 1 - 5 of 8 records
  • Item
    Evaluating Methods of Transferring Large Datasets
    (Springer Nature, 2022) Kopeć, Jakub; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw
    Our society critically depends on data, Big Data. The humanity generates and moves data volumes larger than ever before and their increase is continuously accelerating. The goal of this research is to evaluate tools used for the transfer of large volumes of data. Bulk data transfer is a complex endeavour that requires not only sufficient network infrastructure, but also appropriate software, computing power and storage resources. We report on the series of storage benchmarks conducted using recently developed elbencho tool. The tests were conducted with an objective to understand and avoid I/O bottlenecks during data transfer operation. Subsequently Ethernet and InfiniBand networks performance was compared using Ohio State University bandwidth benchmark (OSU BW) and iperf3 tool. For comparison we also tested traditional (very inefficient) Linux scp and rsync commands as well as tools designed specifically to transfer large datasets more efficiently: bbcp and MDTMFTP. Additionally the impact of using simultaneous multi-threading and Ethernet jumbo frames on transfer rate was evaluated.
  • Item
    Otwieranie nauki w Polsce
    (Stowarzyszenie EBIB, 2012) Szprot, Jakub; Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski
    W referacie zostaną przedstawione przedsięwzięcia i inicjatywy związane z otwartą nauką w Polsce, z uwzględnieniem kluczowych w obecnej sytuacji obszarów, takich jak: otwarty dostęp do treści naukowych (czasopisma w otwartym dostępie, otwarte repozytoria), infrastruktura informatyczna otwartej nauki (projekty polskie oraz europejskie, w których uczestniczą polscy partnerzy) i prawne narzędzia otwartości w nauce (otwarte licencje, otwarte mandaty). Omówiona zostanie rola poszczególnych podmiotów (instytucji rządowych, naukowych i badawczych, organizacji społecznych) w działaniach na rzecz otwartej nauki. Zostanie również podjęta próba określenia podstawowych problemów oraz najistotniejszych wyzwań, z którymi musi zmierzyć się ruch otwartej nauki w Polsce.
  • Item
    Evaluation of Features for Author Name Disambiguation Using Linear Support Vector Machines
    (IEEE Computer Society Conference Publishing Services, 2012-03-27) Dendek, Piotr Jan; Bolikowski, Łukasz; Łukasik, Michał; Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski
    Author name disambiguation allows to distinguish between two or more authors sharing the same name. In a previous paper, we have proposed a name disambiguation framework in which for each author name in each article we build a context consisting of classification codes, bibliographic references, co-authors, etc. Then, by pairwise comparison of contexts, we have been grouping contributions likely referring to the same people. In this paper we examine which elements of the context are most effective in author name disambiguation. We employ linear Support Vector Machines (SVM) to find the most influential features.
  • Item
    A modular metadata extraction system for born-digital articles
    (2012-03-27) Tkaczyk, Dominika; Bolikowski, Łukasz; Czeczko, Artur; Rusek, Krzysztof; Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski
    We present a comprehensive system for extracting metadata from scholarly articles. In our approach the entire document is inspected, including headers and footers of all the pages as well as bibliographic references. The system is based on a modular workflow which allows for evaluation, unit testing and replacement of individual components. The workflow is optimized towards processing of born-digital documents, but may accept scanned document images as well. The machinelearning approaches we have chosen for solving individual tasks increase the ability to adapt to new document layouts and formats. The evaluation tests we have performed showed good results of the individual implementations and the entire metadata extraction process.
  • Item
    GROTOAP: GROund Truth for Open Access Publications
    (ACM, 2012-06) Tkaczyk, Dominika; Czeczko, Artur; Rusek, Krzysztof; Bolikowski, Łukasz; Bogacewicz, Roman; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw
    The field of digital document content analysis includes many important tasks, for example page segmentation or zone classification. It is impossible to build effective solutions for such problems and evaluate their performance without a reliable test set, that contains both input documents and expected results of segmentation and classification. In this paper we present GROTOAP — a test set useful for training and performance evaluation of page segmentation and zone classification tasks. The test set contains input articles in a digital form and corresponding ground truth files. All input documents included in the test set have been selected from DOAJ database, which indexes articles published under CC-BY license. The whole test set is available under the same license.