Getting data into OpenRefine
Overview
Teaching: 5 min
Exercises: 5 minQuestions
How can we bring our data into OpenRefine?
Objectives
Create a new OpenRefine project from a CSV file.
Understand potential problems with file headers.
Creating a new OpenRefine project
In Windows, you can start the OpenRefine program by double-clicking on the openrefine.exe file. Java services will start automatically on your machine, and OpenRefine will open in your browser. On a Mac, OpenRefine can be launched from your Applications folder. If you are using Linux, you will need to navigate to your OpenRefine directory in the command line and run ./refine
.
OpenRefine can import a variety of file types, including tab separated (tsv
), comma separated (csv
), Excel (xls
, xlsx
), JSON, XML, RDF as XML, Google Spreadsheets. See the OpenRefine Importers page for more information.
In this first step, we’ll browse our computer to the sample data file for this lesson. In this case, we will be using data on the top requested books from the Edmonton Public Library. Instructions on downloading the data are available here.
Once OpenRefine is launched in your browser, the left margin has options to Create Project
, Open Project
, or Import Project
. Here we will create a new project:
- Click
Create Project
and selectGet data from
This Computer
. - Click
Choose Files
and select the fileMost_Popular_Books_by_Branch___Edmonton_Public_Library.csv
. ClickOpen
or double-click on the filename. - Click
Next>>
under the browse button to upload the data into OpenRefine. - OpenRefine gives you a preview - a chance to show you it understood the file. If, for example, your file was really tab-delimited, the preview might look strange, you would choose the correct separator in the box shown and click
Update Preview
(bottom left). If this is the wrong file, click<<Start Over
(upper left). There are also options to indicate whether the dataset has column headers included and whether OpenRefine should skip a number of rows before reading the data. - If all looks well, click
Create Project>>
(upper right).
Note that at step 1, you could upload data in a standard form from a web address by selecting Get data from
Web Addresses (URLs)
. However, this won’t work for all URLs.
Key Points
OpenRefine can import a variety of file types.