Splitting data
Overview
Teaching: 5 min
Exercises: 5 minQuestions
How do we deal with multiple data in a single field?
Objectives
Learn to split data
Recover from problems when modifying data
Splitting data
As we’ve noticed, the Title column holds more than just a title.
After the title of an item, there is a slash character (surrounded by spaces). We can tell OpenRefine to look for patterns to use as delimitters of separata data items.
- Remove all facets and filters on the data.
 - Go to the menu at the top of the 
Titlecolumn. - Select 
Edit column>Split into several columns ... - Make sure that 
By Separatoris selected - For the separator, type in ‘ / ‘ (a space, a slash, and another space)
 - Press ‘Ok’
 
The Title column is now gone and replaced by three columns:
Title 1, Title 2, and Title 3. The Title 1 column does appear to hold the title, Title 2 appears to hold author information, but Title 3 appears to be blank.
We can facet on Title 3 to find out what data it holds (create a text facet). We discover that every row is blank except for one: Joshua Williamson, Tom King ....
If we click on that row, we discover that the Title 1 is Batman, and Title 2 is The Flash : the button, deluxe edition.
Let’s undo the column split to see what’s going on.
If we now do a text filter on the Title column and search for batman,
we find a number of matches, in particular Batman / The Flash : the button, deluxe edition / Joshua Williamson, Tom King, writers ; Jason Fabok, Howard Porter, artists ; Brad Anderson, Hi-Fi, colorists ; Deron Bennett, Steve Wands, letterers.
The problem here is that the same set of characters that is used to split the title and author information is also used inside of a title!
We can fix this:
- Go to the header menu for the 
Titlecolumn and selectEdit Cells>Replace - Replace 
Batman / The FlashwithBatman/The Flashin the title. 
Remove the filter and split the column using ‘ / ‘. We now just
get two columns Title 1 and Title 2.
We can now rename the columns: in the Title 1 header menu,
go to Edit column > Rename this column, and change the name to
Title.
The Title 2 column can be named to something like Title Extra.
Key Points
OpenRefine can break apart columns that hold multiple data