Scaling out federated queries for life science data in production

De Witte, Dieter; De Vocht, Laurens; Knecht, Kenny; Pattyn, Filip; Constandt, Hans; Mannens, Erik; Verborgh, Ruben

Scaling out federated queries for life science data in production

Dieter De Witte, Laurens De Vocht, Kenny Knecht, Filip Pattyn, Hans Constandt, Erik Mannens, and Ruben Verborgh

There exists an abundance of Linked Data storage solutions, but only few meet the requirements of a production environment with interlinked life sciences data. In such environments, a triple store has to support complex SPARQL queries and handle large datasets with hundreds of millions of triples. The Ontoforce platform DisQover offers federated search for life sciences, relying on complex federated queries over open life science data, run in an ETL-pipeline to anticipate user actions in its exploratory search interface. Different state-of-the-art approaches for scaling out are compared, both in terms of their ability to execute the queries as in terms of ETL pipeline performance. This paper analyzes and discusses the features of the datasets and query mixes. An in-depth analysis is provided on an individual query basis revealing the strengths and weaknesses with respect to certain query types.

BibTeX other citation formats

Published in 2016 in Proceedings of the 9^th International Conference on Semantic Web Applications and Tools for Life Sciences.

Keywords:

Linked Data
SPARQL

Read this article online

Request a digital copy of this article.

Cite this article in your work

Cite this article easily using its BibTeX entry:

@inproceedings{dewitte_swat4ls_2016,
  author = {De Witte, Dieter and De Vocht, Laurens and Knecht, Kenny and Pattyn, Filip and Constandt, Hans and Mannens, Erik and Verborgh, Ruben},
  title = {Scaling out federated queries for life science data in production},
  booktitle = {Proceedings of the 9th International Conference on Semantic Web Applications and Tools for Life Sciences},
  year = 2016,
  month = dec,
}

Alternatively, pick a reference of your choice below:

ACM: Dieter De Witte, Laurens De Vocht, Kenny Knecht, Filip Pattyn, Hans Constandt, Erik Mannens, and Ruben Verborgh. 2016. Scaling out federated queries for life science data in production. In Proceedings of the 9^th International Conference on Semantic Web Applications and Tools for Life Sciences.
APA: De Witte, D., De Vocht, L., Knecht, K., Pattyn, F., Constandt, H., Mannens, E., & Verborgh, R. (2016, December). Scaling out federated queries for life science data in production. Proceedings of the 9^th International Conference on Semantic Web Applications and Tools for Life Sciences.
IEEE: D. De Witte et al., “Scaling out federated queries for life science data in production,” in Proceedings of the 9^th International Conference on Semantic Web Applications and Tools for Life Sciences, 2016.
LNCS: De Witte, D., De Vocht, L., Knecht, K., Pattyn, F., Constandt, H., Mannens, E., Verborgh, R.: Scaling out federated queries for life science data in production. In: Proceedings of the 9^th International Conference on Semantic Web Applications and Tools for Life Sciences (2016).
MLA: De Witte, Dieter, et al. “Scaling out Federated Queries for Life Science Data in Production.” Proceedings of the 9^th International Conference on Semantic Web Applications and Tools for Life Sciences, 2016.