Examining Numbers in OpenRefine
Overview
Teaching: 5 min
Exercises: 5 minQuestions
How can we convert a column from one data type to another?
Objectives
Transform a text column into a number column.
Identify non-numeric values in a column using facets.
Numbers
When a table is imported into OpenRefine, all columns are treated as having text values.
As we saw earlier with dates, we can transform columns to other data types
(e.g. date or number) using the Edit cells
> Common transforms
feature.
Here we will experiment changing columns to numbers and see what additional
capabilities that grants us.
Be sure to remove any Text filter
facets you have enabled from the left panel
so that we can examine our whole dataset. You can remove an existing facet
by clicking the x
in the upper left of that facet window.
To transform cells in the Number of Holds
column to numbers, click the down
arrow for that column, then Edit cells
> Common transforms ...
> To number
.
You will notice the Number of Holds
values change from left-justified to right-justified,
and black to green in color.
Exercise
Try Transforming the
Title
column to number. What happens?Solution
Only observations that include only numerals (0-9) can be transformed to numbers. If you apply a number transformation to a column that doesn’t meet this criteria, and then click the
Undo / Redo
tab, you will see a step that starts withText transform on 0 cells
. This means that the data in that column was not transformed.Click above this change in the undo stack so that this failed operation is removed from our undo history.
Numeric facet
Sometimes there are non-number values or blanks in a column which may represent errors
in data entry and we want to find them.
We can do that with a Numeric facet
.
Exercise
- For the
Number of Holds
column, edit one or two cells, replacing the numbers with text (such asabc
) or blank (no number or text).- Use the pulldown menu to apply a numeric facet to the column you edited. The facet will appear in the left panel.
- Notice that there are several checkboxes in this facet:
Numeric
,Non-numeric
,Blank
, andError
. Below these are counts of the number of cells in each category. You should see checks forNon-numeric
andBlank
if you changed some values.- Experiment with checking or unchecking these boxes to select subsets of your data.
- Go to the undo stack in the
Undo / Redo
tab to undo these single cell edits.
Go to the numberic facet and press reset. The graph in the facet gives us a histogram of how many times each number in the column has appeared.
Play with the facet – at the far right of the histogram are the titles with the most number of requests for the users of a branch for a specific week. What was the item that had the most requests at a branch for a week?
When done examining the numeric data, remove this facet by clicking the x
in the upper
left corner of its panel.
Key Points
OpenRefine also provides ways to get overviews of numerical data.