The "Aha" Moment: How to Onboard an API Service and Get Active Users
Introducing Serverless Data Feeds
Share Data Without Sharing Credentials: Introducing Pipe-level Permissions
How to Embed a Live, Refreshable D3.js Chart into GitHub Pages
A 90 Degree Tilt: Introducing Vertical Pipes
A Simple Pipe Routing Example: HTML Upload to HTML Display
Introducing our API and Command Line Interface: Flex.io for Developers
Just Binge-Listened to 95 SaaStr Podcasts, Here's What I Learned
Alden Loury: Using Data to Tackle the Hard Truths
Alden Loury is currently the Director of Research and Evaluation at Metropolitan Planning Council, a nonprofit community organization for Chicago. This summer, he’s been working on a series of blog posts highlighting the growth of segregation in the suburbs, the difficulty of commuting to suburban jobs, and the population decline in Chicagoland. Alden previously worked as a senior policy analyst at the Better Government Association and as a reporter, senior editor, and publisher at the Chicago Reporter. His print journalism career stretches back to the 1990s, when he got his start at the News-Gazette in Champaign, IL.
We were lucky enough to have Alden come to our office a few weeks ago for a conversation about the dispassionate power of data, the importance of sharing data resources, and his first ever data journalism story.
The first time I worked with data was when I was at the News Gazette. I did a story working on how the City of Champaign [had] an ordinance that essentially required vendors who were doing business with the city provide information about the diversity of their workforce. So, every time the city entered into a contract with the vendor, the vendor was required to complete a form [giving] a breakdown of their workforce by racial and ethnic group. If the human resources office deemed that those numbers were too low, then the vendor was required to provide a plan on how to improve those numbers.
The use of data in that story was tracking those numbers, showing the level of diversity or lack thereof for these companies. I had information for about 75 different vendors, including the newspaper that I worked for, which had an abysmal record. I knew from observation that I was one of two or three black people who worked for the company, which had maybe about 75 employees. The News Gazette, in addition to a host of other companies, was front and center when we printed this story showing their lack of diversity.
You have to spend a whole load of hours just to clean up the data. That’s a real part of the process. There’s some folks who will get stuff and they’ll just say, “I can’t do anything with this.” But a data set that’s completely unusable can become a very rich data set for you — it just takes a little investment of time to clean the data and get it ready for yourself.
There’s also inspecting the data. You’re running quieres, you’re scanning the data, you’re spending some time looking it over, making sure there aren’t glaring errors in the data or missing records. Things like that that can trip you up. If you don’t realize it, you could spend hours analyzing a data set and be like, “This doesn’t have all the records that I asked for; there’s a hole here.”
If you spend a little time when you first open up the data set, you can quickly identify and you notice something’s wrong. But if you don’t do that, you could waste a lot of time before you realize the issues.
After you’ve cleaned the data, how do you approach the process of analyzing the data to uncover a story?
I like to think about data as a source. So if you’ve got a data set that is the results of every court case in Cook County in a five year period of time, think of this as if there was a way to get the minds of every judge in Cook County that decided a case in a five year period of time embodied in one person. You can ask that person, “Remember the case from June 6th of 2008?” And he’s like, “Yes, I remember that case. The defendant was Joe. He was up for murder and we found him guilty on all charges, so we sentenced him to thirty years in jail.”
That’s essentially what you have with these data sets. It’s almost like this omniscient being that knows all of these details. Well, if you had that being sitting in front of you, what would you ask them? That’s the kind of power that you have with these data sets.
The Chicago Reporter is known for investigating systematic trends of race and poverty. Is it easier to get people to confront these types of hard truths using data?
This is one of the reasons that the Chicago Reporter used data journalism from its inception. It was writing about issues of race and issues of class. Those are very emotional issues, and issues that people bring a lot of their own personal opinions to, sometimes stereotyped opinions. The data was a way to cut through all of that. What the numbers tell us, and hopefully enlighten all of us about, is “Here’s what we may think, what we believe in, but here’s what’s actually happening in our community”. That, I think, is the real utility in using and analyzing data.
We rely on numbers as a society for practically everything we do, whether it’s the weather, the markets, or the sports from last night’s ball games. There’s a lot about our world that is shaped by numbers and those numbers have a level of definity to them — we made 6% less in profit last quarter than we did a year before.
By the same token, whether talking about segregation or poverty or talking about mental health issues, numbers can tell us a lot of very helpful and useful things about what’s happening in our world, even in these very complex and nuanced issues. They can provide at least some foundation to say, “OK, we know this is happening.” Even if we don’t know how to deal with it, what we do know is this is what’s happening. On some level, things can’t be disputed.
What are some of the limitations or potential pitfalls of using data to understand these social issues?
The tricky part is that even in a [data-centric] world, numbers can be manipulated. People can take numbers and pull out of that analysis the findings that most align with whatever point they’re trying to get across.
You still have to be careful, as a responsible journalist, that you are not falling into that trap — that you can sniff out problems in methodology, problems in numbers that have been reported that may be simply incomplete. You have to be careful in thinking about the ways in which people are using data.
In some of your stories in the Chicago Reporter, there are links to download the data sets. Why was it important for you to do that?
The data itself can be a resource. Within a story, even if it’s a 3,000 or 3,500 word story, when you’re dealing with a complex and sophisticated set of information, we can’t tell you every single thing that we thought the numbers said. One data set I remember was 8 million records. We were telling people what were the key things we found analyzing a data set of 8 million records, but there are so many more things that can be gleaned from that information.
So providing people the trends and other things that we identified, we’d put as much of that stuff out there for people to download and to play around with, because we would take advantage of the opportunity to do that when other folks put data out.
I have done that. There were times when I was at the Reporter when we partnered with organizations and sometimes when I was at the Better Government Association that we would partner with other media partners. First and foremost, anytime you’re doing any kind of data analysis, you’ve got to share the methodology. It’s a way of being transparent and having integrity, but it’s another part of the story because how you look at a piece of information can really change the way you look at the results.
Very few places do what we were trying to do at the Chicago Reporter, which is sharing that information. But ultimately, thinking about it from a community level, maybe you can’t get to it right away and if you really believe in the utility of what you’re doing, doesn’t it serve the greater good of the public to ensure that somebody will pick up on it? If it’s not you, why not allow it to be someone else?
This interview is part of our Summer Data Journalism Series where we speak with data journalists based in Chicago and beyond about their work and challenges with data. The interview has been edited for clarity and length.