Exploring Entity Recognition and Disambiguation for Cultural Heritage Collections
Unstructured metadata fields such as “description” offer tremendous value for users to understand cultural heritage objects. However, this type of narrative information is of little direct use within a machine-readable context due to its unstructured nature. This paper explores the possibilities and limitations of Named-Entity Recognition (NER) and Term Extraction (TE) to mine such unstructured metadata for meaningful concepts. These concepts can be used to leverage otherwise limited searching and browsing operations, but they can also play an important role to foster Digital Humanities research. In order to catalyze experimentation with NER and TE, the paper proposes an evaluation of the performance of three third-party entity extraction services through a comprehensive case study, based on the descriptive fields of the Smithsonian Cooper-Hewitt National Design Museum in New York. In order to cover both NER and TE, we first offer a quantitative analysis of named-entities retrieved by the services in terms of precision and recall compared to a manually annotated gold-standard corpus, then complement this approach with a more qualitative assessment of relevant terms extracted. Based on the outcomes of this double analysis, the conclusions present the added value of entity extraction services, but also indicate the dangers of uncritically using NER and/
full text BibTeX other citation formats
Published in 2015 in Digital Scholarship in the Humanities.
- Linked Data
- metadata
- research
Read this article online
- Read the full text online.
- Request a digital copy of this article.
- Comment on this article.
Cite this article in your work
Cite this article easily using its BibTeX entry:
@article{vanhooland_llc_2015,
title = {Exploring Entity Recognition and Disambiguation for Cultural Heritage Collections},
author = {van Hooland, Seth and De Wilde, Max and Verborgh, Ruben and Steiner, Thomas and Van de Walle, Rik},
journal = {Digital Scholarship in the Humanities},
year = 2015,
month = jun,
volume = 30,
number = 2,
pages = {262--279},
url = {http://freeyourmetadata.org/publications/named-entity-recognition.pdf},
doi = {10.1093/llc/fqt067},
}
Alternatively, pick a reference of your choice below:
- ACM
- Seth van Hooland, Max De Wilde, Ruben Verborgh, Thomas Steiner, and Rik Van de Walle. 2015. Exploring Entity Recognition and Disambiguation for Cultural Heritage Collections. Digital Scholarship in the Humanities 30, 2 (June 2015), 262–279.
- APA
- van Hooland, S., De Wilde, M., Verborgh, R., Steiner, T., & Van de Walle, R. (2015). Exploring Entity Recognition and Disambiguation for Cultural Heritage Collections. Digital Scholarship in the Humanities, 30(2), 262–279.
- IEEE
- S. van Hooland, M. De Wilde, R. Verborgh, T. Steiner, and R. Van de Walle, “Exploring Entity Recognition and Disambiguation for Cultural Heritage Collections,” Digital Scholarship in the Humanities, vol. 30, no. 2, pp. 262–279, Jun. 2015.
- LNCS
- van Hooland, S., De Wilde, M., Verborgh, R., Steiner, T., Van de Walle, R.: Exploring Entity Recognition and Disambiguation for Cultural Heritage Collections. Digital Scholarship in the Humanities. 30, 262–279 (2015).
- MLA
- van Hooland, Seth, et al. “Exploring Entity Recognition and Disambiguation for Cultural Heritage Collections.” Digital Scholarship in the Humanities, vol. 30, no. 2, June 2015, pp. 262–79.
Discuss this article
- Discover all publications by Ruben Verborgh.
- Find related articles on Google Scholar.
- Post your questions or comments below.