Talking With Data Journalists: 5 Takeaways from Our Summer Research Project

September 15, 2016 by Nate Williams

Over the summer, we embarked on a research project to learn more about data journalism and how data journalists work with data.

We’ve had the pleasure and privilege to speak with a wide range of veteran data journalists about their work.  These conversations have been insightful and, frankly, inspirational.  We’ve heard about Chicago-based investigations on criminal justice from Matt Kiefer and Jonah Newman, discussed the role of an editor in data projects  with a Pulitzer Prize-winning reporter, Manny Garcia, and learned about sites like Joe Germuska’s Census Reporter that help journalists and citizens access public information more readily.

We also learned a lot about our central question: what are the common bottlenecks data journalists encounter in their data projects?

At the end of it all, we’ve really come to appreciate the field of data journalism and all the work it involves.  Here are five key points that came up across multiple conversations, and some takeaways they suggest about how to make data projects more efficient.

1. A data journalist is just a journalist who’s willing to learn

One of the common threads that ran through many of our interviews was how self-taught many data journalists are. Most of the people we talked to didn’t get a degree in data journalism or set out with it as their initial career focus. They were simply curious about questions they encountered and needed to work with data to answer those questions.  By slogging through one data set after another, they slowly built up a repertoire of data-related skills.

“Now I’m taking on coding, which has been hard, but I’m sticking with it. You want to keep growing so I’ve tried to do that… I try to, when possible, learn things in the context of something I know I’m going to use.” — Jeff Kelly Lowenstein

Takeaway: The learning curve for performing data projects is a bottleneck. Providing easier, more intuitive data tools can help more journalists start working with data.

2. Data journalism is already everywhere

After working on this project, we’ve come to see data journalism not as the “next big thing”, but as a practice that’s already integrated into a wide range of reporting. A lot of people already know that data is a key part of news sites like FiveThirtyEight, but there’s also a ton of data journalism out there on everything from the record-setting Rio Olympics to unraveling the Panama Papers to understanding race, poverty, and suburban commute times in Chicago.

“I can’t think of a really influential investigative project in the last few years that didn’t have some data component.” — Chris Groskopf

“Not too long from now, if not already, there’s just going to be an expectation that when data comes your way, you should be processing it. So I think that “data journalism” is an anachronism waiting to happen. Just call it journalism.”  — Matt Kiefer

Takeaway: The “democratization of data” is already in full swing. Making data more adaptable can help data journalists build on the work of other projects.

Manny Garcia quotation

3. Sharing is one of the community’s core values

Data journalists seem to be open to sharing pretty much everything: source data, data tools, and code. Many of the data journalists we spoke with highlighted a culture of sharing as one of the main reasons they love doing this kind of work and a key source of their success.  Because the practice of data journalism is still so new in its current form, events like NICAR, sharing open source tools, and collaborating on techniques for dealing with common problems have had a big impact on driving the discipline forward.

“Part of what attracted me to [data journalism] was when I went to my first NICAR and I met all these people, there was no sense of proprietariness about it… There was a very collegial attitude that we’re all in this together and nobody was trying to hide things or keep things secret. And out of that grew, I think, a real spirit of sharing.” — Chris Groskopf

Takeaway: Open data is an essential resource. Making it easier to publish and document new open data sets benefits the entire data journalism community.

4. A lot of caution should go into working with data

As a practice rooted in hard numbers, data journalism can lead to conclusions that seem indisputable.  After all, a fact is a fact is a fact.  But we noticed over the interviews how many data journalists really emphasized the need for caution with data – in particular, being careful not to assume that a data set is always an accurate representation of reality.  Data can be distorted, it can be missing stuff, it can be skewed, and, in the process of analysis, it’s easy to make faulty assumptions or commit basic errors. Much can go wrong the process of producing data journalism.

“You still have to be careful, as a responsible journalist, that you are not falling into that trap — that you can sniff out problems in methodology, problems in numbers that have been reported that may be simply incomplete. You have to be careful in thinking about the ways in which people are using data.” — Alden Loury

I really vet data because data sets are all dirty, [and] because remember: “garbage in, garbage out.” — Manny Garcia

Takeaway: A key need for data projects is having the ability to quickly profile and understand data sets.

5. Transparency is essential

Because data can be messy, another key challenge for data journalists is evaluating the reliability of work that others publish.  This issue extends to readers as well – how does someone without data skills or access to the data sources verify the conclusions drawn from data?

A key answer to this is transparency.  When it’s done with rigor, full transparency puts everything out in the open about how data was used to produce a story, include the data sources, the code, how certain variables were defined, and the complete methodology that led to newsworthy conclusions.  It ensures that when we see a statistic floating around or a statement like “the middle class is declining”, we know exactly where it came from.

Many of the data journalists we spoke with stressed the importance of transparency.  This is a key point, because it also touches on an even more fundamental question: how do we decide what information to trust? How do we know what’s true?

“If you’re doing it behind closed doors, it’s like, “Trust us”.  Whereas, if you’re putting the data out there, you’re being very open about it and others can do the same analysis and come to the same conclusions that we did. So show your work, just like you used to have to do in algebra tests. You should be showing your work in journalism.” — Andy Boyle

Takeaway: It’s important to be able to trace where a statistic or data visualization comes from. Showing the history and workflow of data projects should be easier.

Set of color vintage floral border

Going to the source: the full list of interviews

We learned something new and useful from every one of the data journalists we spoke with.  These conversations touched on just about every facet of data projects — from filing FOIA requests and getting access to data to finding the right way to communicate results.  In all, they provided many more interesting points than what we can share here.

Here’s a complete list of who we spoke with along with a link to each interview:

A big thanks to everyone who took the time to meet and speak with us!  We very much appreciate your thoughts on data journalism and the process of turning data into newsworthy insights.