Cleaning Data with OpenRefine
Don’t take your data at face value. That is the key message of this tutorial which focuses on how scholars can diagnose and act upon the accuracy of data. In this lesson, you will learn the principles and practice of data cleaning, as well as how OpenRefine can be used to perform four essential tasks that will help you to clean your data: 1. remove duplicate records; 2. separate multiple values contained in the same field; 3.aAnalyse the distribution of values throughout a data set; 4. group together different representations of the same reality. These steps are illustrated with the help of a series of exercises based on a collection of metadata from the Powerhouse Museum, demonstrating how (semi-)automated methods can help you correct the errors in your data.
full text BibTeX other citation formats
Published in 2013 in The Programming Historian.
- metadata
Read this chapter online
- Read the full text online.
- Request a digital copy of this chapter.
- Comment on this chapter.
Cite this chapter in your work
Cite this chapter easily using its BibTeX entry:
@incollection{vanhooland_programminghistorian_2014,
title = {Cleaning Data with {OpenRefine}},
author = {van Hooland, Seth and Verborgh, Ruben and De Wilde, Max},
year = 2013,
month = aug,
booktitle = {The Programming Historian},
editor = {Crymble, Adam and Burns, Patrick and McGregor, Nora},
url = {http://programminghistorian.org/lessons/cleaning-data-with-openrefine},
}
Alternatively, pick a reference of your choice below:
- ACM
- Seth van Hooland, Ruben Verborgh, and Max De Wilde. 2013. Cleaning Data with OpenRefine. In The Programming Historian, Adam Crymble, Patrick Burns and Nora McGregor (eds.).
- APA
- van Hooland, S., Verborgh, R., & De Wilde, M. (2013). Cleaning Data with OpenRefine. In A. Crymble, P. Burns, & N. McGregor (Eds.), The Programming Historian.
- IEEE
- S. van Hooland, R. Verborgh, and M. De Wilde, “Cleaning Data with OpenRefine,” in The Programming Historian, A. Crymble, P. Burns, and N. McGregor, Eds. 2013.
- LNCS
- van Hooland, S., Verborgh, R., De Wilde, M.: Cleaning Data with OpenRefine. In: Crymble, A., Burns, P., and McGregor, N. (eds.) The Programming Historian (2013).
- MLA
- van Hooland, Seth, et al. “Cleaning Data with OpenRefine.” The Programming Historian, edited by Adam Crymble et al., 2013.
Discuss this chapter
- Discover all publications by Ruben Verborgh.
- Find related articles on Google Scholar.
- Post your questions or comments below.