I was waiting for a paralegal licensee tour to arrive at the library’s front service desk. As the librarian and I were shooting the breeze, we talked about the anti-theft gates. The gates have a tracking mechanism and, it turns out, the librarian has been keeping a chart of foot traffic from the gates. For 10 years. And no-one has used it. The next morning, I had a request for our annual data benchmark update and I started thinking about data, and waste, and outputs.

I’ve blogged before about how year-end is a good time to think about the stories your library tells. One truth about libraries and data is that you need to plan ahead to get the data you need. At the same time, there’s no value in collecting data for the off chance.

That was the flip side of the request for annual data. Could we provide more detail about some of the data points we had tracked in previous years? Short answer? No, not this year. And this highlights the balancing aspect of data collection to support library narratives.

Over Collection

The foot traffic gates are a great example of data that is tricky to use. We are open to the public and every person who enters and leaves the library – every one – is counted twice. If they return, they’re recounted. Bathroom breaks, going to the foyer to take a phone call, staff entering and leaving. Everything is counted.

But it’s binary. In, out. The counted foot traffic doesn’t mean anything other than feet moved. If the number goes up, is it because of a special event (Doors Open Toronto, for example) or because there’s more research going on? If it goes down, is it because of a special event (like when the G20 closed down much of Toronto) or because weather kept people away.

Is the value of the library dependent on how many people went in and out of the doors? Interestingly enough, that was one of the metrics the Inner Temple law library used when trying to justify overall library usage. If your law library is restricted to a particular audience and the audience only uses the law library for research, then foot traffic can be useful. Otherwise, it’s just a number.

Over collection can do more than waste staff time. Our local public library collected gender data for some reason. Poorly trained staff challenged a child over their gender identification. Collect what you need to impact services, and train your staff if they need to get the data from your customer.

My rule of thumb tends to be that (a) there has to be a reliable way of capturing data that ties it to the purpose and (b) there has to be a purpose that has an impact. Foot traffic is too distant from research use. A book removed from the shelf or a database search is as close as you can get to research use without watching the person. But you still need to ask – “why am I counting books removed?” (collection development, library use, etc.) and “why am I counting a search?” (collection development, license use, hour-of-day activity, etc.).

Now that I know that the data is there, I’ll take a look and see if there’s anything useful in it. But I have a feeling there isn’t, and that it’s an opportunity to stop doing something.

Under Collection

The risk of course, is that you then start to under collect. First you stop monitoring the door traffic and then you stop something else and at year end, you get a request for data that you no longer (or never) had.

The request I had this year focused on whether we could provide more granular details around our data. Like many law libraries, we track reference in 3 generic buckets: easy, medium, hard (directional, quick, complex).

Reference statistics are a great example of the frustrating compexity of reporting out data. Fundamentally, reference statistics are meaningless outside the reference context unless it is understood that they largely approximate people. If I report 20,000 reference interactions in year A and 19,000 in year B and 21,000 in year C, my governance folks won’t be able to discern any significant pattern changing.

±5% isn’t nothing but it’s also not significant unless it becomes a trend. In my experience, reference question fluctuation mostly reflects how many bodies and hours you can throw at answering reference. The more people and hours, the more questions answered.

To that end, providing funders or governance boards with reference statistics really is only useful to show an approximation of work. If we want to answer more reference questions, give me more people. If you take people from me, expect a drop in reference.

That’s different from internal use. The nuance of those questions can help a reference team redeploy its resources. Shift towards directional quick questions? Maybe a couple of handouts or Libguides will reduce that drain on reference staff time.

Shift towards complex questions? Time to better understand why law library expertise is needed. The needs analysis may highlight a problem in reporting or an opportunity to build out a new service.

But if I’m capturing reference questions (easy, medium, hard) and the report gatherers want more detail (practice area or practice context), I need to have lead time to incorporate that capture. I would also want to have a discussion about “why”. What is the fundamental impact of providing additional detail?

The Right Balance

To be clear, I love capturing data. The more the better. But the reality is that every click in reference trackers like Gimlet, every annotation added to a reference question database, requires time. The more detail you capture, the more effort your team expends capturing it.

In order to track practice area, we could ask the researcher what area they are practicing in. Or we can guess. Similarly, we could ask if they are a solo or in practice with a firm. In a subscription library like ours, we could also ask them to provide a member number. That would allow us to track a lot more data about their interactions with the library, since we require them to log in to our systems.

Even in a subscription-based law library, asking for more identifying information is tricky. People don’t like to be asked for more detail. Our law library is part of a lawyer regulator, so there is sometimes fear that asking a question will be reported to our discipline teams (it won’t).

There’s also no value to capturing that extra detail. If we are handling more questions for personal injury lawyers working as solos, will we fundamentally change how we are providing reference? No. We might license content differently, or we might need to get training to use tools better or differently, but that output is available without counting heads. And an external decision-maker won’t be in a position to determine whether that type of granularity is meaningful.

Year end is a good time for thinking about some of these issues. You can prepare to capture more data in the new year; doing it on January 1 is nice for future comparative data use. And you can also stop capturing data that you haven’t looked at it a year or more.

If you’re like me and you’ve been asked to provide more detail, it’s a great opportunity to have a discussion with the requestor to understand why the detail is needed. It may reflect an information need you weren’t aware of. Or it may reflect a misunderstanding of how you perceive your data to be used at higher levels in your organization.

However it turns out, it’s a good learning opportunity.