Big Linked Data ETL Benchmark on Cloud Commodity Hardware
Linked Data storage solutions often optimize for low latency querying and quick responsiveness. Meanwhile, in the back-end, offline ETL processes take care of integrating and preparing the data. In this paper we explain a workflow and the results of a benchmark that examines which Linked Data storage solution and setup should be chosen for different dataset sizes to optimize the cost-effectiveness of the entire ETL process. The benchmark executes diversified stress tests on the storage solutions. The results include an in-depth analysis of four mature Linked Data solutions with commercial support and full SPARQL 1.1 compliance. Whereas traditional benchmarks studies generally deploy the triple stores on premises using high-end hardware, this benchmark uses publicly available cloud machine images for reproducibility and runs on commodity hardware. All stores are tested using their default configuration. In this setting Virtuoso shows the best performance in general. The other tree stores show competitive results and have disjunct areas of excellence. Finally, it is shown that each store’s performance heavily depends on the structural properties of the queries, giving an indication of where vendors can focus their optimization efforts.
full text BibTeX other citation formats
Published in 2016 in Proceedings of the International Workshop on Semantic Big Data.
- Linked Data
- SPARQL
Read this article online
- Read the full text online.
- Request a digital copy of this article.
- Comment on this article.
Cite this article in your work
Cite this article easily using its BibTeX entry:
@inproceedings{dewitte_sbd_2016,
author = {De Witte, Dieter and De Vocht, Laurens and Verborgh, Ruben and Knecht, Kenny and Pattyn, Filip and Constandt, Hans and Mannens, Erik and Van de Walle, Rik},
title = {Big {Linked Data} {ETL} Benchmark on Cloud Commodity Hardware},
booktitle = {Proceedings of the International Workshop on Semantic Big Data},
year = 2016,
month = jun,
isbn = {978-1-4503-4299-5},
pages = {12:1--12:6},
doi = {10.1145/2928294.2928304},
publisher = {ACM},
address = {New York, NY, USA},
url = {https://dl.acm.org/authorize?N20620},
}
Alternatively, pick a reference of your choice below:
- ACM
- Dieter De Witte, Laurens De Vocht, Ruben Verborgh, Kenny Knecht, Filip Pattyn, Hans Constandt, Erik Mannens, and Rik Van de Walle. 2016. Big Linked Data ETL Benchmark on Cloud Commodity Hardware. In Proceedings of the International Workshop on Semantic Big Data, ACM, New York, NY, USA, 12:1–12:6.
- APA
- De Witte, D., De Vocht, L., Verborgh, R., Knecht, K., Pattyn, F., Constandt, H., Mannens, E., & Van de Walle, R. (2016). Big Linked Data ETL Benchmark on Cloud Commodity Hardware. Proceedings of the International Workshop on Semantic Big Data, 12:1–12:6.
- IEEE
- D. De Witte et al., “Big Linked Data ETL Benchmark on Cloud Commodity Hardware,” in Proceedings of the International Workshop on Semantic Big Data, New York, NY, USA, 2016, pp. 12:1–12:6.
- LNCS
- De Witte, D., De Vocht, L., Verborgh, R., Knecht, K., Pattyn, F., Constandt, H., Mannens, E., Van de Walle, R.: Big Linked Data ETL Benchmark on Cloud Commodity Hardware. In: Proceedings of the International Workshop on Semantic Big Data. pp. 12:1–12:6. ACM, New York, NY, USA (2016).
- MLA
- De Witte, Dieter, et al. “Big Linked Data ETL Benchmark on Cloud Commodity Hardware.” Proceedings of the International Workshop on Semantic Big Data, ACM, 2016, pp. 12:1–12:6.
Discuss this article
- Discover all publications by Ruben Verborgh.
- Find related articles on Google Scholar.
- Post your questions or comments below.