Evaluating Methods of Transferring Large Datasets
Abstract
Our society critically depends on data, Big Data. The humanity generates and moves data volumes larger than ever before and their increase is continuously accelerating. The goal of this research is to evaluate tools used for the transfer of large volumes of data. Bulk data transfer is a complex endeavour that requires not only sufficient network infrastructure, but also appropriate software, computing power and storage resources. We report on the series of storage benchmarks conducted using recently developed elbencho tool. The tests were conducted with an objective to understand and avoid I/O bottlenecks during data transfer operation. Subsequently Ethernet and InfiniBand networks performance was compared using Ohio State University bandwidth benchmark (OSU BW) and iperf3 tool. For comparison we also tested traditional (very inefficient) Linux scp and rsync commands as well as tools designed specifically to transfer large datasets more efficiently: bbcp and MDTMFTP. Additionally the impact of using simultaneous multi-threading and Ethernet jumbo frames on transfer rate was evaluated.
Description
Keywords
Citation
Kopeć, J. (2022). Evaluating Methods of Transferring Large Datasets. In: Panda, D.K., Sullivan, M. (eds) Supercomputing Frontiers. SCFA 2022. Lecture Notes in Computer Science, vol 13214. Springer, Cham. https://doi.org/10.1007/978-3-031-10419-0_7