Assessing and Refining Mappings to RDF to Improve Dataset Quality
RDF dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment to the publishing workflow. Adjustments are manually—but rarely—applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the RDF dataset will be generated, is not identified. We suggest an incremental, iterative and uniform validation workflow for RDF datasets stemming originally from semi-structured data (e.g., CSV, XML, JSON). In this work, we focus on assessing and improving their mappings. We incorporate i) a test-driven approach for assessing the mappings instead of the RDF dataset itself, as mappings reflect how the dataset will be formed when generated; and ii) perform semi-automatic mapping refinements based on the results of the quality assessment. The proposed workflow is applied to different cases, e.g., large, crowdsourced datasets as DBpedia, or newly generated, as iLastic. Our evaluation indicates the efficiency of our workflow, as it improves significantly the overall quality of an RDF dataset in the observed cases.
Published in 2015 in The Semantic Web – ISWC 2015.
Read this paper online
Cite this paper in your publications
- Use the BibTeX entry to easily refer to this paper.
- Alternatively, you can refer to this paper as: Dimou, A., Kontokostas, D., Freudenberg, M., Verborgh, R., Lehmann, J., Mannens, E., Hellmann, S., et al. (2015), “Assessing and Refining Mappings to RDF to Improve Dataset Quality”, in Arenas, M., Corcho, O., Simperl, E., Strohmaier, M., d’Aquin, M., Srinivas, K., Groth, P., et al. (Eds.), The Semantic Web – ISWC 2015, Vol. 9367, Lecture Notes in Computer Science, Springer, pp. 133–149.