Share Data Without Sharing Credentials: Introducing Pipe-level Permissions
How to Embed a Live, Refreshable D3.js Chart into GitHub Pages
A 90 Degree Tilt: Introducing Vertical Pipes
A Simple Pipe Routing Example: HTML Upload to HTML Display
Introducing our API and Command Line Interface: Flex.io for Developers
Just Binge-Listened to 95 SaaStr Podcasts, Here's What I Learned
Adding Dynamic Content to a Static Web Page
Lessons from the Data Ecosystem: Part 2
What We've Learned from Exploring the Data Ecosystem: Part 1
Talking With Data Journalists: 5 Takeaways from Our Summer Research Project
Over the summer, we embarked on a research project to learn more about data journalism and how data journalists work with data.
We’ve had the pleasure and privilege to speak with a wide range of veteran data journalists about their work. These conversations have been insightful and, frankly, inspirational. We’ve heard about Chicago-based investigations on criminal justice from Matt Kiefer and Jonah Newman, discussed the role of an editor in data projects with a Pulitzer Prize-winning reporter, Manny Garcia, and learned about sites like Joe Germuska’s Census Reporter that help journalists and citizens access public information more readily.
We also learned a lot about our central question: what are the common bottlenecks data journalists encounter in their data projects?
At the end of it all, we’ve really come to appreciate the field of data journalism and all the work it involves. Here are five key points that came up across multiple conversations, and some takeaways they suggest about how to make data projects more efficient.
One of the common threads that ran through many of our interviews was how self-taught many data journalists are. Most of the people we talked to didn’t get a degree in data journalism or set out with it as their initial career focus. They were simply curious about questions they encountered and needed to work with data to answer those questions. By slogging through one data set after another, they slowly built up a repertoire of data-related skills.
“Now I’m taking on coding, which has been hard, but I’m sticking with it. You want to keep growing so I’ve tried to do that… I try to, when possible, learn things in the context of something I know I’m going to use.” — Jeff Kelly Lowenstein
Takeaway: The learning curve for performing data projects is a bottleneck. Providing easier, more intuitive data tools can help more journalists start working with data.
After working on this project, we’ve come to see data journalism not as the “next big thing”, but as a practice that’s already integrated into a wide range of reporting. A lot of people already know that data is a key part of news sites like FiveThirtyEight, but there’s also a ton of data journalism out there on everything from the record-setting Rio Olympics to unraveling the Panama Papers to understanding race, poverty, and suburban commute times in Chicago.
“I can’t think of a really influential investigative project in the last few years that didn’t have some data component.” — Chris Groskopf
“Not too long from now, if not already, there’s just going to be an expectation that when data comes your way, you should be processing it. So I think that “data journalism” is an anachronism waiting to happen. Just call it journalism.” — Matt Kiefer
Takeaway: The “democratization of data” is already in full swing. Making data more adaptable can help data journalists build on the work of other projects.
Data journalists seem to be open to sharing pretty much everything: source data, data tools, and code. Many of the data journalists we spoke with highlighted a culture of sharing as one of the main reasons they love doing this kind of work and a key source of their success. Because the practice of data journalism is still so new in its current form, events like NICAR, sharing open source tools, and collaborating on techniques for dealing with common problems have had a big impact on driving the discipline forward.
“Part of what attracted me to [data journalism] was when I went to my first NICAR and I met all these people, there was no sense of proprietariness about it… There was a very collegial attitude that we’re all in this together and nobody was trying to hide things or keep things secret. And out of that grew, I think, a real spirit of sharing.” — Chris Groskopf
Takeaway: Open data is an essential resource. Making it easier to publish and document new open data sets benefits the entire data journalism community.
As a practice rooted in hard numbers, data journalism can lead to conclusions that seem indisputable. After all, a fact is a fact is a fact. But we noticed over the interviews how many data journalists really emphasized the need for caution with data – in particular, being careful not to assume that a data set is always an accurate representation of reality. Data can be distorted, it can be missing stuff, it can be skewed, and, in the process of analysis, it’s easy to make faulty assumptions or commit basic errors. Much can go wrong the process of producing data journalism.
“You still have to be careful, as a responsible journalist, that you are not falling into that trap — that you can sniff out problems in methodology, problems in numbers that have been reported that may be simply incomplete. You have to be careful in thinking about the ways in which people are using data.” — Alden Loury
I really vet data because data sets are all dirty, [and] because remember: “garbage in, garbage out.” — Manny Garcia
Takeaway: A key need for data projects is having the ability to quickly profile and understand data sets.
Because data can be messy, another key challenge for data journalists is evaluating the reliability of work that others publish. This issue extends to readers as well – how does someone without data skills or access to the data sources verify the conclusions drawn from data?
A key answer to this is transparency. When it’s done with rigor, full transparency puts everything out in the open about how data was used to produce a story, include the data sources, the code, how certain variables were defined, and the complete methodology that led to newsworthy conclusions. It ensures that when we see a statistic floating around or a statement like “the middle class is declining“, we know exactly where it came from.
Many of the data journalists we spoke with stressed the importance of transparency. This is a key point, because it also touches on an even more fundamental question: how do we decide what information to trust? How do we know what’s true?
“If you’re doing it behind closed doors, it’s like, “Trust us”. Whereas, if you’re putting the data out there, you’re being very open about it and others can do the same analysis and come to the same conclusions that we did. So show your work, just like you used to have to do in algebra tests. You should be showing your work in journalism.” — Andy Boyle
Takeaway: It’s important to be able to trace where a statistic or data visualization comes from. Showing the history and workflow of data projects should be easier.
We learned something new and useful from every one of the data journalists we spoke with. These conversations touched on just about every facet of data projects — from filing FOIA requests and getting access to data to finding the right way to communicate results. In all, they provided many more interesting points than what we can share here.
Here’s a complete list of who we spoke with along with a link to each interview:
- Alex Richards — Data reporter at NerdWallet — On Data Tools, Challenges, and Being Skeptical
- Abe Epton — Data reporter at KUOW-FM — On Data Journalism: Pandas, Pain Points, and More
- Matt Kiefer — Editor at the Chicago Reporter — Anachronisms, Data Exclusivity and Teaching the Best Tools
- Daniel Hertz — Senior fellow at the City Observatory — Demographic Data and Slim Margins of Error
- Tim Broderick — Data journalism editor at the Daily Herald — On Data Projects and Probing Illinois School Data
- Jeff Kelly Lowenstein — Journalism professor at Columbia College — Using Data to Ask Bigger Questions
- Jonah Newman — Data reporter at the Chicago Reporter — Collaborating on Data Projects
- Andy Boyle — Developer at NBC’s BreakingNews.com — On Developing Data Apps and Presenting Data
- Alden Loury— Director of research at Metropolitan Planning Council — Using Data to Tackle the Hard Truths
- Manny Garcia — Editor of the Naples Daily News — On the Role of an Editor in Data Journalism
- Chris Groskopf — Data reporter at Quartz — On Sharing Data and the Value of Transparency
- Giannina Segnini — Journalism professor at Columbia University — Using Data to Uncover New Stories
- Joe Germuska — Executive director at the Knight Lab — On Developing Data Tools and Making Data Adaptable
A big thanks to everyone who took the time to meet and speak with us! We very much appreciate your thoughts on data journalism and the process of turning data into newsworthy insights.