Joe Germuska: On Developing Data Tools and Making Data Adaptable
Joe Germuska is Executive Director at the Knight Lab at Northwestern University, or as he calls it, “Chief Nerd”. Starting out as a programmer, he got into journalism in 2009 while working at the Chicago Tribune developing news applications and interactive data sites, such as “Crime in Chicagoland”, which aimed to make all reported crime data in the Chicagoland area searchable and more transparent. Joe was also the project lead for Census Reporter, a project funded by the Knight News Challenge that makes it easier for journalists and the public to access data from the U.S. Census.
In your own words, what is data journalism?
I think it is applying quantitative analysis to reporting – so, finding data sources and analyzing them to get a different perspective on a story.
What are some of the most frustrating processes you run into when working with data?
I don’t do as much data work as I used to, and I’ve never been a really intense quantitative analyst for anything other than curiosity. But a lot of the time it’s just figuring out what’s in front of you. Yesterday, I was figuring out what to do with our billing data from Amazon Web Services, and I kind of wanted to understand where stuff was. I had these data sets that had twenty or thirty columns in there, but for half the columns, the values were identical in every row for us, because we have a simple situation. So I had to figure out, “OK, I can actually discard half of this?”, then I could know what’s in front of me and go from there to thinking about to what questions I want to formulate.
So I think before I retire, we’d hopefully have made a lot of progress on describing data systematically so people can more readily recognize what they’re working with and what they might do with it, which for right now is a pretty involved process.
How do you think journalism has changed with the way data has been incorporated into it? What are some of most exciting developments you’ve come across?
So I think the big change in journalism is that data tools are becoming more available and more people have at least a passing familiarity with [them]. Whereas once doing data analysis would require your newsroom to buy you thousands of dollars of software, now you can just download R and do it yourself, and you can also find a community of people online who can help you with this stuff.
The community of people doing data journalism is vibrant and very collegial, and they’ve been helping each other do this stuff for longer than I’ve been doing it at all. But various changes in the Internet and the nature of the world make it easier than ever before for journalists to find each other and get it going. I only got into journalism in 2009, so I can’t really speak to history apart from the stories I’ve heard from people, but I think that’s also one of the exciting developments.
I think it’s great for more people to have basic numerical literacy, and create a story and say, “Well wait, there must be numbers behind this. Can I find them and can I look at them myself? Can I think about them and see how it fits with what people are telling me or what conventional wisdom is?”
When you were at the Tribune, you ran a blog that showed how you got the numbers behind quite a lot of the data analysis, with the tagline “Show Your Work”. Do you see data journalism continuing to move toward transparency with data analysis?
My personal preference and instincts are along the lines of “show your work”. I think at the Tribune, we as a team made it a priority to be as transparent as we could be. But I think that the effort involved is considerable, the payback is hard to quantify, so it doesn’t really surprise me that a lot of news organizations don’t put a lot of energy into that. I don’t think I can really say that they should, but I just wish they would and I’m happy when I see it happen.
It’s a challenge in academia too, where there’s more expectation of some form of reproducibility and building a chain of research. Journalism doesn’t really hold to those kinds of standards, even if the idea of contributing to the public knowledge is there. It’s also the case that historically journalists haven’t dumped out their notes and their details and sources. There’s a little bit of an instinct, especially among older school editors and reporters, to want to keep your stuff close to your chest. So there’s at least some level of that kind of historical instinct that pushes against that transparency, as well as the fact that it’s a fair amount of extra work.
The Knight Lab at Northwestern University where you work has produced some of the most relevant and widely used open source tools for digital journalists today. What are some data tools that you’ve worked on at the Knight Lab? How do these tools help data journalists?
There’s a diversity of answers. TimelineJS sprung from a specific experience that one of our faculty had at his previous career at the New York Times, and he was asking students to tell stories in that form. [He] found that the state of tools for that were just very short of where they ought to be. He figured out a way to come up with something that could help his students focus on the storytelling and not the technology. And that’s a goal for all journalists — their job is not to be tech experts, their job is to convey information.
StoryMap was a natural pivot from that, because a lot of people asked about putting maps in their timelines in a story form. So we thought about it for a while and heard what people described and rather than making TimelineJS more complicated, it made sense to make a separate tool with a different focus, but still prioritize storytelling and easier use for non-technical people.
SoundCite was an invention that one of our students came up with where he wanted to be able to use sounds in his stories, and figured out how it would be possible and actually started the work before it was at the lab. He did an independent study with a faculty member who guided him down the road, and then we helped him finish it up and harden it so it’d be suitable for wider use.
One of the most helpful things we’ve seen has been Census Reporter. Where did the idea for this come from and what was the process of building it?
So the idea came from work I’d done at the Tribune, realizing that there were some things that we did every time we worked with Census data that probably all journalists would do – understanding the questions that journalists had in mind, and the approaches they would take when looking at Census data. It seemed like there ought to be a higher baseline that people could start from instead of the very raw data that the Census Bureau provides. We just thought, there’s got to be some stuff we could do where if we did it once, a whole bunch of people would benefit from it.
That led to the pitch for Census Reporter, which was to hire two full-time developers, a community liaison and a designer to implement the project. I was not expecting to be doing a lot of the coding – not because I didn’t want to but because I thought it would not be possible with my regular job. So the [Knight News Challenge] grant money paid for those staff positions for 15 months, and I was an advisor.
Before, as part of putting in the grant, I’d convened some sort of an advisory board, with myself and two other working journalists who use Census data for their projects. We were going to direct these employees in their work, and then it turned out that one of the people on the board decided to quit his job and become one of the developers on the project as well.
Do you see Census Reporter being developed further?
Going into the future…there’s this easy to use, surface level of Census Reporter that’s all that many people ever see, but there’s also more advanced tools for people who want to get deeper into the [Census] data in terms of selecting and preparing a data set.
We’ve tried to design it so that our data management process was visible to people and people who had more powerful user goals could take what we did at a more technical level and go somewhere with that too. We want to make it really easy to use for people who need it easy, but also make it open and adaptable for people who are able to get into stuff and take it where they want it to go.
This interview is part of our Summer Data Journalism Series where we speak with data journalists based in Chicago and beyond about their work and challenges with data. The interview has been edited for clarity and length.