[Profile picture of Ruben Verborgh]

Ruben Verborgh

Parallel RDF generation from heterogeneous big data

Gerald Haesendonck, Wouter Maroy, Pieter Heyvaert, Ruben Verborgh, and Anastasia Dimou

To unlock the value of increasingly available data in high volumes, we need flexible ways to integrate data across different sources. While semantic integration can be provided through RDF generation, current generators insufficiently scale in terms of volume. Generators are limited by memory constraints. Therefore, we developed the RMLStreamer, a generator that parallelizes the ingestion and mapping tasks of RDF generation across multiple instances. In this paper, we analyze what aspects are parallelizable and we introduce an approach for parallel RDF generation. We describe how we implemented our proposed approach, in the frame of the RMLStreamer, and how the resulting scaling behavior compares to other RDF generators. The RMLStreamer ingests data at 50% faster rate than existing generators through parallel ingestion.

full text BibTeX other citation formats

Published in 2019 in Proceedings of the International Workshop on Semantic Big Data.

Keywords:

Read this article online

Cite this article in your work

Cite this article easily using its BibTeX entry:

@inproceedings{haesendonck_sbd_2019,
  author = {Haesendonck, Gerald and Maroy, Wouter and Heyvaert, Pieter and Verborgh, Ruben and Dimou, Anastasia},
  title = {Parallel {RDF} generation from heterogeneous big data},
  booktitle = {Proceedings of the International Workshop on Semantic Big Data},
  year = 2019,
  month = jul,
  isbn = {978-1-4503-6766-0},
  doi = {10.1145/3323878.3325802},
  url = {https://dl.acm.org/authorize?N680652},
}

Alternatively, pick a reference of your choice below:

ACM
Gerald Haesendonck, Wouter Maroy, Pieter Heyvaert, Ruben Verborgh, and Anastasia Dimou. 2019. Parallel RDF generation from heterogeneous big data. In Proceedings of the International Workshop on Semantic Big Data.
APA
Haesendonck, G., Maroy, W., Heyvaert, P., Verborgh, R., & Dimou, A. (2019, July). Parallel RDF generation from heterogeneous big data. Proceedings of the International Workshop on Semantic Big Data.
IEEE
G. Haesendonck, W. Maroy, P. Heyvaert, R. Verborgh, and A. Dimou, “Parallel RDF generation from heterogeneous big data,” in Proceedings of the International Workshop on Semantic Big Data, 2019.
LNCS
Haesendonck, G., Maroy, W., Heyvaert, P., Verborgh, R., Dimou, A.: Parallel RDF generation from heterogeneous big data. In: Proceedings of the International Workshop on Semantic Big Data (2019).
MLA
Haesendonck, Gerald, et al. “Parallel RDF Generation from Heterogeneous Big Data.” Proceedings of the International Workshop on Semantic Big Data, 2019.

Discuss this article