Giannina Segnini: Using Data to Uncover New Stories
Giannina Segnini is the James Madison Visiting Assistant Professor of Journalism at the Graduate School of Journalism at Columbia University and the Director of the Master of Science Data Concentration Program. Prior to her academic career, Giannina worked as the editor of the investigative unit at La Nacion in Costa Rica. She has been involved in dozens of noteworthy investigative projects over the years, from building the offshore leaks database to assisting with shipping and trade data in the Panama Papers. Her work has garnered a long list of accolades and prizes, including the Maria Moors Cabot Award and the Excellence Award from the Gabriel García Márquez Foundation.
We caught up with Giannina over the phone to ask her about how data journalism has evolved over the years, the challenges of cross-border investigation, and her advice to her students about getting started in data journalism.
In your own words, what is data journalism?
Data journalism is, one, quality journalism, and you can improve it or empower journalism by using data. So it’s basically the use of data to find relevant stories. It’s not just visualization, it’s much more than that. It’s about finding stories out of the data.
How have you seen the use of data in investigative projects change?
There are many ways. Years ago, in the 1990s, we used to call this computer assisted reporting and so the use of data has been there for years — this is not something new. But it’s changing as more journalists get familiar with the use of data in different ways.
You can use data for any kind of journalistic piece from sports to entertainment to politics. It has changed in the uses and it has also changed in formats — data is not just text. And now we’re seeing how other ways to collect data from sensors or drones or getting data from pictures, metadata, video. So this is happening and it’s changing in not only the formats of what we call data, but also in the application in journalism.
When you were working with La Nacion in Costa Rica, a big part of your work was building a set of databases. What went into collecting this data?
Just to give you a little bit of context, I started at the investigative unit in 1994. First it was a little unit because it was just me, but then we [had] three investigative reporters who were basically using traditional methods. So, when we decided to rethink the structure of the unit and we brought [in] computer scientists and geographers, what we did was very different from what others were doing.
Apart from the investigations we were working on, we were collecting public data from multiple sources. Three years later we had basically every public record available consolidated on a server.
How did the having these new data resources affect your reporting?
It changed the way in which we were doing reporting. In the past, it was thinking about a story then collecting the data that you needed to work on that story. This [new] model allowed us to collect the data [on] everything and start thinking creatively [about] how to combine multiple data sets we had and how to come up with projects in the public interest.
We were not only more productive, but also more creative. It’s a good practice to have an inventory of what’s available where and the format of everything. It allows you to start playing chess. You cannot play chess if you don’t see each of the pieces and the overview [of the] whole game. It’s the same thing here. If you don’t know what’s available, it’s hard to start thinking of good ideas for data journalism.
When you created these databases, how do you find and match related records between different data sets?
We had a very convenient situation, which is that in Costa Rica, we have an ID number that’s public. You can actually download the whole data set of Costa Rican [citizens] with their ID numbers. You’re indexed with this number everywhere you go, not just public databases, but also private. That was great because [in] all the data sets, all we did is cross reference the names and make sure that the ID numbers corresponded.
Here [in the U.S.], the situation is different, because when you’re dealing with offshore leaks or Panama Papers, we have people from all over the world with no ID number and it’s difficult to de-duplicate records.
Are there other differences in working with data from the U.S. compared to other countries?
Here in the U.S., it’s very easy to have access to public records. When I’m talking about public records I’m talking bout court records, procurement contracts, and also everything coming from the Securities Exchange Commission. That’s what I call public records.
In the U.S., it’s pretty easy to have access to that information in every state, so you have state level data and you have federal data and all of its valuable — not necessarily the best formats possible, but at least there’s a very open culture of giving access. Not just journalists, but in general, citizens have access to this kind of data.
Europe is particularly different. You cannot get public records on people, companies as easy as you can do it here. Basically, they prioritize legislation that protects people’s privacy. That goes first, [so] it’s a little bit difficult to get, for instance, property records or trade records, or this kind of information.
What are some of the ways you encourage other reporters to get involved with using data?
You have to be passionate about what you’re doing, or at least you have to really love the topic. Everything’s about thinking, it’s not just about typing. I encourage them to think on a particular problem that they’re interested in and look at it from different angles and then visualize reality, because data is one representation of reality.
So, the two [pieces of] advice would be try to find something you really care about and try to keep it simple. This is like driving. If you pretend that you’re going to be a professional driver from day one, that’s not going to happen. It takes time. Use one single data set and start practicing asking the right questions, because that’s the more difficult part. The rest, even the technique, the programming and everything, that’s mechanics. I’ve seen people who know how to program in every language but when they have data in front of them, it’s hard to connect the data with reality.
Previously, you’ve talked about how journalism is a better profession now than ever because journalists have so many ways to innovate with new platforms and methods. What are some examples of these innovations that come to mind?
The use of sensors to collect data allows you to do things you were not able to do in the past. This is an investigation we never finished, but Costa Rica has 25% of the territories protected and we wanted to send sensors to all these places just to check if the forest and species that they claim were protected were real. Imagine if you had to go walk around all these places — it’s practically impossible. But you can use satellite pictures to do this. This is one idea, but I see in the school here everyday students who are using [new] ways to collect data.
Of all the different projects that you’ve worked on, which one have you been the most proud of?
It’s a lesser known project. No one remembers it. 10% of our population in Costa Rica are immigrants, especially from Nicaragua, and they do the hard work that Costa Ricans don’t want to do anymore. Back in the early 90s, there were many people who were illegal and the government approved a special program to allow them to come legally. [It] was the first time I saw xenophobia and assumptions that people repeated during this polarization.
I decided to collect all the data on how these people were impacting the country, from education to labor. Crime, for instance — I was able to demonstrate with data that Nicaraguans were committing less crimes proportional to the population than Costa Ricans. That was an amazing fact because people thought that they were to blame for everything that was happening in the country.
I combined the data analysis with a month in Nicaragua traveling around, because when you do an analysis first and then you go and do reporting, it’s like you’re navigating with a compass. I knew exactly that there were two towns in Nicaragua that I had to visit because many of the immigrants were from those towns. I think it helped to elevate the level and the quality of the conversation in the country on immigration.
This interview is part of our Summer Data Journalism Series where we speak with data journalists based in Chicago and beyond about their work and challenges with data. The interview has been edited for clarity and length.