Detailed Provenance Capture of Data Processing
A large part of scientific output entails computational experiments, e.g., processing data to generate new data. However, this generation process is only documented in human-readable form or as a software repository. This inhibits reproducibility and comparability, as current documentation solutions do not provide detailed metadata and rely on the availability of specific software environments. This paper proposes an automatic capturing mechanism for interchangeable and implementation independent metadata and provenance that includes data processing. Using declarative mapping documents to describe the computational experiment, term-level provenance can be automatically captured, for both schema and data transformations, and storing both the used software tools as the input-output pairs of the data processing executions. This approach is applied to mapping documents described using RML and FnO, and implemented in the RMLMapper. The captured metadata can be used to more easily share, reproduce, and compare the dataset generation process, across software environments.
full text BibTeX other citation formats
Published in 2017 in Proceedings of the 1st Workshop on Enabling Open Semantic Science.
- RML
- RMLMapper
- provenance
- data transformations
- FnO
- metadata
Read this article online
- Read the full text online.
- Request a digital copy of this article.
- Comment on this article.
Cite this article in your work
Cite this article easily using its BibTeX entry:
@inproceedings{demeester_semsci_2017,
author = {De Meester, Ben and Dimou, Anastasia and Verborgh, Ruben and Mannens, Erik},
title = {Detailed Provenance Capture of Data Processing},
booktitle = {Proceedings of the 1st Workshop on Enabling Open Semantic Science},
year = 2017,
month = oct,
series = {CEUR Workshop Proceedings},
volume = 1931,
issn = {1613-0073},
url = {http://ceur-ws.org/Vol-1931/paper-05.pdf},
}
Alternatively, pick a reference of your choice below:
- ACM
- Ben De Meester, Anastasia Dimou, Ruben Verborgh, and Erik Mannens. 2017. Detailed Provenance Capture of Data Processing. In Proceedings of the 1st Workshop on Enabling Open Semantic Science (CEUR Workshop Proceedings).
- APA
- De Meester, B., Dimou, A., Verborgh, R., & Mannens, E. (2017). Detailed Provenance Capture of Data Processing. Proceedings of the 1st Workshop on Enabling Open Semantic Science, 1931.
- IEEE
- B. De Meester, A. Dimou, R. Verborgh, and E. Mannens, “Detailed Provenance Capture of Data Processing,” in Proceedings of the 1st Workshop on Enabling Open Semantic Science, 2017, vol. 1931.
- LNCS
- De Meester, B., Dimou, A., Verborgh, R., Mannens, E.: Detailed Provenance Capture of Data Processing. In: Proceedings of the 1st Workshop on Enabling Open Semantic Science (2017).
- MLA
- De Meester, Ben, et al. “Detailed Provenance Capture of Data Processing.” Proceedings of the 1st Workshop on Enabling Open Semantic Science, vol. 1931, 2017.
Discuss this article
- Discover all publications by Ruben Verborgh.
- Find related articles on Google Scholar.
- Post your questions or comments below.