Share Data Without Sharing Credentials: Introducing Pipe-level Permissions
How to Embed a Live, Refreshable D3.js Chart into GitHub Pages
A 90 Degree Tilt: Introducing Vertical Pipes
A Simple Pipe Routing Example: HTML Upload to HTML Display
Introducing our API and Command Line Interface: Flex.io for Developers
Just Binge-Listened to 95 SaaStr Podcasts, Here's What I Learned
Adding Dynamic Content to a Static Web Page
Lessons from the Data Ecosystem: Part 2
What We've Learned from Exploring the Data Ecosystem: Part 1
Jeff Kelly Lowenstein: Using Data to Ask Bigger Questions
Jeff Kelly Lowenstein currently lectures on journalism at Columbia College and was previously the database and investigative editor at Hoy, the Chicago Tribune’s Spanish-language newspaper. He spent part of 2012 in Santiago, Chile where he taught as a Fulbright scholar at the University of Diego Portales. His numerous projects include investigating nursing home inequality with the Center for Public Integrity, examining the enduring rifts in Chilean society 40 years after the Pinochet coup, as well as actively blogging for the Huffington Post.
For me data journalism is using publicly available data to strengthen or fortify your reporting. I see it as having a number of different phases: acquiring the data, cleaning the data, analyzing the data, visualizing the data. It’s a cycle that usually generates more questions, more different types of angles you want to look at, which then requires [getting] more data. So that’s what I see as data journalism — using data to ask more probing and systemic questions than what you can do by only working on individual cases.
You mentioned the process of analyzing, collecting, and visualizing data. Do you do all of that? What kind of work do you do with data?
Definitely, I do all the steps of the process. I’m actually teaching an investigative journalism class here in South Africa and I’ve talked to the students about how it’s helpful to be able to do each of these steps so that you can understand the process. Sometimes, laying things out on a map or displaying things visually can help show you certain things that might not be as clear or accessible as if you’re looking at them in a less dynamic, less visually oriented format.
Have you ever had problems identifying similar records in two different data sets, joining two data sets, or working with different formats? What are the challenges you face when trying to clean up data?
Pretty much any data set you work with is going to be dirty. You just don’t get clean data and so it’s a very important part of the process because on a very basic level, the integrity of your analysis is contingent on the full range of data that you’re able to say, “this is what we’ve looked at”. Some of the challenges are just [inconsistencies]. There’s one place that’s spelled “Street” and the other one is “St” and the other one is “Streete”. That’s hard because you really have to be attentive to detail to make sure that you’re working that through as clearly as possible. I work a certain amount in Open Refine and that’s good, but then you’re sort of dependent on their suggestions on clusters and you still need to go through and make sure that they’ve cleaned it up. I’m putting my name on this [work], Open Refine isn’t.
Anything else that’s extremely frustrating? Things with data that make you want to beat your head against the wall?
To me, that’s the stuff that makes it worthwhile. I would talk to the interns when we were clicking through all those court cases and some of them were kind of complaining and I was saying, but this is what makes us the Chicago Reporter. The fact that nobody else will do it but we have the time, we have the support, we have the editorial backing. It’s not that fun, it is slow, it can be frustrating, but in the end what you will reveal is worth it.
How has the way you’ve worked with data changed over the years? What are some things that you’ve learned while working on the job?
The first project I did was basically in 2004. I was working for a weekly newspaper on the south side, just south of Hyde Park, and I basically tallied the number of sex offenders by police beat and district by hand. There were like 300 beats and so I just kind of went [through] each one, tallied them up, counted and double counted them. So that was ‘04.
Then I started working at the Reporter, got familiar with doing some basic analysis with Excel. I hired a guy for a project I was on in 2006 to do a regression analysis, and we said, “hey, maybe one day we can learn how to do that.” In 2007, we went and got training in how to do SPSS [and] we were able to do some relatively sophisticated analysis there. I didn’t know how to do maps at all, and I can map now.
Now I’m taking on coding, which has been hard, but I’m sticking with it. You want to keep growing so I’ve tried to do that. I find it’s better if you’re doing it in the context of a project because then you just have an application rather than doing a series of exercises. I try to, when possible, learn things in the context of something I know I’m going to use.
How does the availability of data change the way you approach an investigative project? Are there any unique insights you can only get by using data?
I did a project last year for the Chicago Reporter where there was this guy, Mark Diamond, who was just scamming elderly black homeowners on the south and west sides over and over again. It was just terrible. Basically, this community organizer, Reverend Hood, came to me and said, you know my aunt got ripped off by this guy, she’s in danger of losing her home. So I said, let’s take a look and see if he’s done it to other people and what we found is this guy had been ripping people off for 30 years. There was a court record of almost 100 court cases, circuit court, federal court going after him going back to 2002. [Attorney General] Lisa Madigan had said I’m going to shut this guy down, I’m going after him, and he was still operating.
By looking at the previous history, it allowed us to ask much more pointed questions and not just write “this is a sad story about one 88-year-old lady on the west side being ripped off by some guy” but “this guy has been doing it for 30 years and hasn’t faced one single criminal charge”. It’s using documents or data in different formats to ask more pointed questions of what’s happening of the system. Sometimes it’s CSV, sometimes it’s text, sometimes it’s Excel, sometimes it’s Word. But basically, the consistent goal is getting [data files] into a workable format, asking the questions, and then connecting it to policy, but also at the end of the day asking what does this mean for people’s lives? How are people who are living on the ground experiencing this?
Of all the data projects and the reporting you’ve done, which has been the most rewarding, or the project you’ve been the most proud of?
The nursing home project I did in 2014, that was really a big one. I had basically started working on that issue back in ‘04 when I was on the south side and then a little bit in the years in between. I found out through the Affordable Care Act there was this national data set that nobody had used. Originally I just wanted to do one story but I kept going and did this three part series that was national and each of which had impact. The people that I wrote about, I felt that we did a pretty good job of documenting and sharing their experience. We published in English, we published in Spanish, there was a lot of different pick-up. It contributed to federal policy change, which is a big thing — people filed legislation based on it. That doesn’t happen every day.
_This interview is part of our Summer Data Journalism Series where we speak with data journalists based in Chicago and beyond about their work and and challenges with data. The interview has been edited for clarity and length._