[Profile picture of Ruben Verborgh]

Ruben Verborgh

Cleaning Data with OpenRefine

Seth van Hooland, Ruben Verborgh, and Max De Wilde

Don’t take your data at face value. That is the key message of this tutorial which focuses on how scholars can diagnose and act upon the accuracy of data. In this lesson, you will learn the principles and practice of data cleaning, as well as how OpenRefine can be used to perform four essential tasks that will help you to clean your data: 1. remove duplicate records; 2. separate multiple values contained in the same field; 3.aAnalyse the distribution of values throughout a data set; 4. group together different representations of the same reality. These steps are illustrated with the help of a series of exercises based on a collection of metadata from the Powerhouse Museum, demonstrating how (semi-)automated methods can help you correct the errors in your data.

full text BibTeX other citation formats

Published in 2013 in The Programming Historian.

Keywords:

Read this chapter online

Cite this chapter in your work

Cite this chapter easily using its BibTeX entry:

@incollection{vanhooland_programminghistorian_2014,
  title = {Cleaning Data with {OpenRefine}},
  author = {van Hooland, Seth and Verborgh, Ruben and De Wilde, Max},
  year = 2013,
  month = aug,
  booktitle = {The Programming Historian},
  editor = {Crymble, Adam and Burns, Patrick and McGregor, Nora},
  url = {http://programminghistorian.org/lessons/cleaning-data-with-openrefine},
}

Alternatively, pick a reference of your choice below:

ACM
Seth van Hooland, Ruben Verborgh, and Max De Wilde. 2013. Cleaning Data with OpenRefine. In The Programming Historian, Adam Crymble, Patrick Burns and Nora McGregor (eds.).
APA
van Hooland, S., Verborgh, R., & De Wilde, M. (2013). Cleaning Data with OpenRefine. In A. Crymble, P. Burns, & N. McGregor (Eds.), The Programming Historian.
IEEE
S. van Hooland, R. Verborgh, and M. De Wilde, “Cleaning Data with OpenRefine,” in The Programming Historian, A. Crymble, P. Burns, and N. McGregor, Eds. 2013.
LNCS
van Hooland, S., Verborgh, R., De Wilde, M.: Cleaning Data with OpenRefine. In: Crymble, A., Burns, P., and McGregor, N. (eds.) The Programming Historian (2013).
MLA
van Hooland, Seth, et al. “Cleaning Data with OpenRefine.” The Programming Historian, edited by Adam Crymble et al., 2013.

Discuss this chapter