Share Data Without Sharing Credentials: Introducing Pipe-level Permissions
How to Embed a Live, Refreshable D3.js Chart into GitHub Pages
A 90 Degree Tilt: Introducing Vertical Pipes
A Simple Pipe Routing Example: HTML Upload to HTML Display
Introducing our API and Command Line Interface: Flex.io for Developers
Just Binge-Listened to 95 SaaStr Podcasts, Here's What I Learned
Adding Dynamic Content to a Static Web Page
Lessons from the Data Ecosystem: Part 2
What We've Learned from Exploring the Data Ecosystem: Part 1
Manny Garcia: On the Role of an Editor in Data Journalism
Manny Garcia is a Pulitzer Prize-winning investigative reporter who is now the editor of the Naples Daily News. He previously served as the Executive Editor of El Nuevo Herald, the Spanish-language sister paper of the Miami Herald. In his many years of reporting, he has uncovered absentee ballot fraud in Miami, collaborated with the Toronto Star to expose Canadian pedophiles in Cuba, and covered stories spanning Latin America and the Caribbean.
We caught up with Manny over the phone on a busy news day (Mike Pence was announced as Trump’s VP pick in the middle of our call). While he took a brief break from running the newspaper, he told us about journalism’s evolution since the 90s, the pain of getting data from counties, and why he double and triple checks his reporters’ data work.
You’ve been an investigative journalist for decades. How have you seen the use of data in investigative projects change?
Well, it’s increased. The use of data has increased because more and more information is stored electronically rather than on paper. So what I would recommend, at any point in your career, is to get comfortable using data. Even something as simple as learning to use Excel [to] look at political campaign contributions. Everything is electronic. You can basically go into some state files or federal filings and export right into Excel. Often, you’ve even got the local races where campaigns will send an email with all their contributions, or turn in a CD or flash drive with contributions or expenditures on them. The amount of data is only increasing, so becoming comfortable [and] being able to use that data accurately is very important.
Government, elections, [and] government contracts. Let’s say, for example, you’re covering the City of Chicago. You’ll want to go to the county manager’s office or the county administration and say, “I want a copy of your checkbook, I want a copy of your general ledger.” You want to see how the county spends its money — lists of all contractors, everybody’s who’s a vendor, who’s got a city or government contract. And that’s important because then you’ll want to look at campaign contributions to the mayor, to the council members, commissioners, to see if many of the same people who are giving money are doing business with the city or the county.
What is the role of an editor in data projects? Is there a process you go through to make sure the data analysis is accurate?
Our editors tend to be more well-versed on data. What we do internally is to ask, “How did you get this data? Where did you come up with these facts? Who gave you this information?”
There’s a great reporter at the Chicago Tribune who does data work named Jason Grotto. Jason is a scientist who is good with data and decided he wanted to be a reporter. I would sit in Jason’s office and he had the paranoia of an airplane pilot — how many ways can you crash? He would sit there and say, “How many ways can I screw this up? Why do I believe my stuff?”
You’ve got to ask those questions. That’s part of the editing job. You should do that running the paper.
At the start of a project, how do you go about checking the data? What do you do to verify the integrity of the source data?
I really vet data because data sets are all dirty, [and] because remember: “garbage in, garbage out.” Just because a government agency is giving you information doesn’t mean the agency’s right. You need to find out what’s the original source, how did they get it, what other organizations keep similar data sets.
Let’s say you’re looking at the Department of Corrections in Illinois. First of all, [the initial step] is getting to somebody in the Department of Corrections who is not a public information officer — where you talk to someone who is actually the programmer, and say “Where do you get this data from? How flawed is the data? Do you find yourself cleaning it up?”
The other thing you want to ask for is, “How do you keep this data? What kind of system is it stored in?” And always ask to see the fields — “I would like to get an inventory of all the fields you have in the data”.
Data is just data, but you have to put a face to it. That’s going to come from analyzing a lot of data to find the person who’s going to be your protagonist and who’s going to carry your story.
I did a project with Jason Grotto several years ago that looked at sentencing patterns [and] racial disparities in Florida, and the use of a perk called the withhold of adjudication. Jason analyzed data from the state of Florida, local counties, merging it all using names, date of birth, Social Security numbers, and addresses to see if there were racial disparities in sentencing. Our methodology was to look at two people, two defendants who had the same criminal history, the same primary charge, but the white offender had a worse criminal record than the black offender, yet the white offender walked out of court with a much lighter sentence or deal.
Jason and I looked at over a thousand case files. And to stand up our reporting, we found individuals who we could take pictures of [and] interview, including the victims of the crime, to show how the withhold of adjudication was being abused and how it favored white defendants. It ended up changing Florida law. It won an IRE award. But the point being is that you’ve got to find the face to carry your story.
As an editor now, do you still personally work with data in the course of reporting or is it mostly about monitoring your reporters?
Monitoring and asking questions. Where did we get the data set that we’re going to crunch? Do we trust the data set? Do we trust it to be accurate? Has anyone done a similar analysis?
The other thing when you’re building projects off of data analysis is you’ve got to have people you can run your data by. As a rule, I like to send out our findings [for verification]. That’s what Jason and I did with the withhold of adjudication project. We sent it to a guy at Florida State University who was the head of all data for the Florida Department of Corrections, so he knew this data cold. And we went back and forth, “well this is wrong”, “you need to scrub for this”, “you need to account for that”. Then we sent it to the Department of Corrections, and said this is the data and this is how we’ve done this analysis. They cried foul on a couple of things — it’s one thing to cry foul, but is it wrong?
The long and short of it is when you’re doing data analysis, be transparent. Let people know this is what we found. No one should go on the website or see a story on the 11 o’clock news and be sandbagged.
This interview is part of our Summer Data Journalism Series where we speak with data journalists based in Chicago and beyond about their work and challenges with data. The interview has been edited for clarity and length.