[Profile picture of Ruben Verborgh]

Ruben Verborgh

Triple Storage for Random-Access Versioned Querying of RDF Archives

Ruben Taelman, Miel Vander Sande, Joachim Van Herwegen, Erik Mannens, and Ruben Verborgh

When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis.

full text BibTeX other citation formats

Published in 2019 in Journal of Web Semantics.

Keywords:

Read this article online

Cite this article in your work

Cite this article easily using its BibTeX entry:

@article{taelman_jws_2019,
  title = {Triple Storage for Random-Access Versioned Querying of RDF Archives},
  author = {Taelman, Ruben and Vander Sande, Miel and Van Herwegen, Joachim and Mannens, Erik and Verborgh, Ruben},
  journal = {Journal of Web Semantics},
  volume = 54,
  month = jan,
  year = 2019,
  pages = {4--28},
  doi = {10.1016/j.websem.2018.08.001},
  url = {https://rdfostrich.github.io/article-jws2018-ostrich/},
}

Alternatively, pick a reference of your choice below:

ACM
Ruben Taelman, Miel Vander Sande, Joachim Van Herwegen, Erik Mannens, and Ruben Verborgh. 2019. Triple Storage for Random-Access Versioned Querying of RDF Archives. Journal of Web Semantics 54, (January 2019), 4–28.
APA
Taelman, R., Vander Sande, M., Van Herwegen, J., Mannens, E., & Verborgh, R. (2019). Triple Storage for Random-Access Versioned Querying of RDF Archives. Journal of Web Semantics, 54, 4–28.
IEEE
R. Taelman, M. Vander Sande, J. Van Herwegen, E. Mannens, and R. Verborgh, “Triple Storage for Random-Access Versioned Querying of RDF Archives,” Journal of Web Semantics, vol. 54, pp. 4–28, Jan. 2019.
LNCS
Taelman, R., Vander Sande, M., Van Herwegen, J., Mannens, E., Verborgh, R.: Triple Storage for Random-Access Versioned Querying of RDF Archives. Journal of Web Semantics. 54, 4–28 (2019).
MLA
Taelman, Ruben, et al. “Triple Storage for Random-Access Versioned Querying of RDF Archives.” Journal of Web Semantics, vol. 54, Jan. 2019, pp. 4–28.

Discuss this article