Know your data
Overview
Teaching: 5 min
Exercises: 5 minQuestions
What information is in our data?
What kinds of questions can our data answer?
What kinds of questions can not be answered by our data?
Objectives
Think about the data and it’s potential uses.
What the data says and what it doesn’t say
The data
Each row includes the following information:
- Branch ID
- Branch Name
- Number of Holds
- Title
- Author
- As of Date
- Web URL
Given a row like this one:
Branch ID | Branch Name | Number of Holds | Title | Author | As of Date | Web URL |
---|---|---|---|---|---|---|
EPLLON | Londonderry Branch | 36 | The girl on the train / Paula Hawkins | Hawkins Paula | 03/16/2015 12:00:00 AM | http://epl.bibliocommons.com/search?t=smart&q=the%20girl%20on |
We can interpret this in words as “On the week preceding March 16, 2015, The girl on the train
by
Paula Hawkins
was one of the top-ten most requested items at the Londonderry Branch
.
There were 36
holds for this item”.
Note that for this week and for this branch, there will be nine other rows containing the other top-ten titles.
Questions about the data
Think about the data as a collection, and contemplate the following questions:
Question
Notice that the above row does not give us enough information to tell us what position in the top-ten
The girl on the train
was atLondonderry Branch
onMarch 16, 2015
.But can we figure this out from this data set?
Answer
Question
Is there any information in our data set that is either redundant or highly correlated?
Answer
Question
If I have a date and a title, can I add up the holds in the rows in the data that match to get an indication of the total number of holds for that item at that date for all of EPL?
Answer
Question
If I have a branch name and a title, can I add up the number of holds over the weeks to get the total number of people requesting that item at that branch?
Answer
Question
Can I use the data to determine how many weeks a title was in the top 10 at a particular branch?
Answer
Key Points
It’s important to think about what questions your data can answer.