Exploring and Cleaning Data with OpenRefine: Setup

Data

The data for this lesson is a part of the Edmonton Open Data Catalogue. The data represent the top ten books with the highest number of holds for several weeks since 2015. The data are organized by the home branch of the customers requesting the holds (e.g., top 10 from Woodcroft, top 10 from Jasper Place, etc.).

Information about the data can be found at https://data.edmonton.ca/Community-Centres/Most-Popular-Books-by-Branch-Edmonton-Public-Libra/qdgm-hex6.

A CSV file of the page can be downloaded from the Edmonton Open Data Portal here:

https://data.edmonton.ca/api/views/qdgm-hex6/rows.csv?accessType=DOWNLOAD

Software

For this lesson you will need OpenRefine (formerly Google Refine) and a web browser.

Note: this is a Java program that runs on your machine (not in the cloud). It runs inside your browser, but no web connection is needed.

Windows

Mac

Linux