[Profile picture of Ruben Verborgh]

Ruben Verborgh

Detailed Provenance Capture of Data Processing

by Ben De Meester, Anastasia Dimou, Ruben Verborgh, and Erik Mannens

A large part of scientific output entails computational experiments, e.g., processing data to generate new data. However, this generation process is only documented in human-readable form or as a software repository. This inhibits reproducibility and comparability, as current documentation solutions do not provide detailed metadata and rely on the availability of specific software environments. This paper proposes an automatic capturing mechanism for interchangeable and implementation independent metadata and provenance that includes data processing. Using declarative mapping documents to describe the computational experiment, term-level provenance can be automatically captured, for both schema and data transformations, and storing both the used software tools as the input-output pairs of the data processing executions. This approach is applied to mapping documents described using RML and FnO, and implemented in the RMLMapper. The captured metadata can be used to more easily share, reproduce, and compare the dataset generation process, across software environments.

Full text BibTeX Mendeley

Published in 2017 in Proceedings of the 1st Workshop on Enabling Open Semantic Science.

Keywords: provenance, RML, data transformations, FnO, metadata

Read this article online

Cite this article in your publications

Use the BibTeX entry to easily refer to this article, or any of these snippets:

IEEE
B. De Meester, A. Dimou, R. Verborgh, and E. Mannens, “Detailed Provenance Capture of Data Processing,” in Proceedings of the 1st Workshop on Enabling Open Semantic Science, 2017, vol. 1931.
ACM
Ben De Meester, Anastasia Dimou, Ruben Verborgh, and Erik Mannens. 2017. Detailed Provenance Capture of Data Processing. In Proceedings of the 1st Workshop on Enabling Open Semantic Science. CEUR Workshop Proceedings.
LNCS
De Meester, B., Dimou, A., Verborgh, R., Mannens, E.: Detailed Provenance Capture of Data Processing. In: Proceedings of the 1st Workshop on Enabling Open Semantic Science (2017).
APA
De Meester, B., Dimou, A., Verborgh, R., & Mannens, E. (2017). Detailed Provenance Capture of Data Processing. In Proceedings of the 1st Workshop on Enabling Open Semantic Science (Vol. 1931).
MLA
De Meester, Ben et al. “Detailed Provenance Capture of Data Processing.” Proceedings of the 1st Workshop on Enabling Open Semantic Science. Vol. 1931. 2017. Print. CEUR Workshop Proceedings.

Discuss this article