[Profile picture of Ruben Verborgh]

Ruben Verborgh

Big Linked Data ETL Benchmark on Cloud Commodity Hardware

by Dieter De Witte, Laurens De Vocht, Ruben Verborgh, Kenny Knecht, Filip Pattyn, Hans Constandt, Erik Mannens, and Rik Van de Walle

Linked Data storage solutions often optimize for low latency querying and quick responsiveness. Meanwhile, in the back-end, offline ETL processes take care of integrating and preparing the data. In this paper we explain a workflow and the results of a benchmark that examines which Linked Data storage solution and setup should be chosen for different dataset sizes to optimize the cost-effectiveness of the entire ETL process. The benchmark executes diversified stress tests on the storage solutions. The results include an in-depth analysis of four mature Linked Data solutions with commercial support and full SPARQL 1.1 compliance. Whereas traditional benchmarks studies generally deploy the triple stores on premises using high-end hardware, this benchmark uses publicly available cloud machine images for reproducibility and runs on commodity hardware. All stores are tested using their default configuration. In this setting Virtuoso shows the best performance in general. The other tree stores show competitive results and have disjunct areas of excellence. Finally, it is shown that each store’s performance heavily depends on the structural properties of the queries, giving an indication of where vendors can focus their optimization efforts.

full text BibTeX other citation formats

Published in 2016 in Proceedings of the International Workshop on Semantic Big Data.

Keywords:

Read this article online

Cite this article in your work

Cite this article easily using its BibTeX entry:

@inproceedings{dewitte_sbd_2016,
  author = {De Witte, Dieter and De Vocht, Laurens and Verborgh, Ruben and Knecht, Kenny and Pattyn, Filip and Constandt, Hans and Mannens, Erik and Van de Walle, Rik},
  title = {Big {Linked Data} {ETL} Benchmark on Cloud Commodity Hardware},
  booktitle = {Proceedings of the International Workshop on Semantic Big Data},
  year = 2016,
  month = jun,
  isbn = {978-1-4503-4299-5},
  pages = {12:1--12:6},
  doi = {10.1145/2928294.2928304},
  publisher = {ACM},
  address = {New York, NY, USA},
  url = {https://dl.acm.org/authorize?N20620},
}

Alternatively, pick a reference of your choice below:

IEEE
D. De Witte, L. De Vocht, R. Verborgh, K. Knecht, F. Pattyn, H. Constandt, E. Mannens, and R. Van de Walle, “Big Linked Data ETL Benchmark on Cloud Commodity Hardware,” in Proceedings of the International Workshop on Semantic Big Data, New York, NY, USA, 2016, pp. 12:1–12:6.
ACM
Dieter De Witte et al. 2016. Big Linked Data ETL Benchmark on Cloud Commodity Hardware. In Proceedings of the International Workshop on Semantic Big Data. New York, NY, USA: ACM, 12:1–12:6.
LNCS
De Witte, D., De Vocht, L., Verborgh, R., Knecht, K., Pattyn, F., Constandt, H., Mannens, E., Van de Walle, R.: Big Linked Data ETL Benchmark on Cloud Commodity Hardware. In: Proceedings of the International Workshop on Semantic Big Data. pp. 12:1–12:6. ACM, New York, NY, USA (2016).
APA
De Witte, D., De Vocht, L., Verborgh, R., Knecht, K., Pattyn, F., Constandt, H., … Van de Walle, R. (2016). Big Linked Data ETL Benchmark on Cloud Commodity Hardware. In Proceedings of the International Workshop on Semantic Big Data (pp. 12:1–12:6). New York, NY, USA: ACM.
MLA
De Witte, Dieter et al. “Big Linked Data ETL Benchmark on Cloud Commodity Hardware.” Proceedings of the International Workshop on Semantic Big Data. New York, NY, USA: ACM, 2016. 12:1–12:6. Print.

Discuss this article