Data Journalism Survey Results: Crunching the Numbers

September 12, 2016 by Nate Williams

During our summer research project, we’ve had the pleasure of interviewing over 15 data journalists about their work so far.  And while we’ve been hearing fascinating stories and learning a lot about how they work with data, we also wanted to put some numbers to these conversations. Enter our Data Journalism Survey.

This past month, we surveyed a sample of 27 data journalists, asking them about the data tools and languages they use, their common sources of data, the volume and format of their data, and the kinds of data tasks they perform. In some ways, the results from this survey support key points highlighted in our interviews (e.g., PDFs are notoriously hard to extract data from), but they’ve also thrown us a few surprising twists.

If you’d like to play with the raw data, you can download the full set of survey results here.

The data journalists who participated in our survey range from freelance journalists to reporters and investigative journalists who work for news outlets like CNN, The Chicago Reporter, and the World Economic Forum.  Overall, 40.7% of those who responded indicate they’ve been working in data journalism for 1-3 years, and 40.7% have been working in the industry for 4-10 years.

Here are six points that stand out in the survey results.

Data journalist length of experience graph

1. Excel is the most commonly used data tool

In our interviews, most of the data journalists we spoke with have at least some working knowledge of programming languages including Python, R, and SQL.  However, the survey results show that, generally, Excel is the most common tool data journalists use.

Nearly 85.2% of the survey respondents use MS Excel, with 26% indicating that it’s the tool they use “most”.  Several respondents also listed other spreadsheet tools (e.g., Google Sheets) as the tool they used the most.  Given Excel’s status as a jumping off point for data wrangling and analysis, perhaps it’s not surprising that it’s used so frequently. As one of the journalists we spoke with, Alex Richards, puts it, Excel is “the one program I always come back to.”

2. Data journalists use a broad range of tools

Overall, data journalists have a large number of tools in their data toolkit.  Excel may be widely used, but most data journalists have other tools they use more frequently.  However, the specifics for this vary widely.  For some, it’s MySQL or Postgres, for others it’s RStudio or Tableau, and for other’s it’s Carto.  The range spans from desktop tools, such as OpenRefine, to programming languages and libraries, such as Python and Pandas, to Web-based tools.

Data tools graph

One thing we noticed when talking with data journalists is just how many of them use Python for their data work.  The survey confirms this — 79% of respondents use Python regularly.  However, SQL is also equally important, with 79% usage by our respondents.  And JavaScript is close behind is with 75% of respondents saying they use it.

4. CSV is the most common data format

Almost all of the survey respondents (96.3%) say they receive data in a CSV format, and about three-fourths (76.9%) indicate that it’s the most common format they encounter.  In the words of Abe Epton, “CSV is kind of like the lingua franca for a lot of data journalism.” Indeed, it seems that it is.

In our conversations with data journalists, when we ask about where they get their data from, we’ve often heard that Freedom of Information Act (FOIA) requests are involved, either on the state or federal level.  The survey responses support this to a certain extent.  Among all survey respondents, 33.3% say that they use FOIA requests “very often”.  But almost as many (25.9%) say that they “never” use FOIA requests.

However, open data sources are used “very often” to a greater extent than FOIA requests, with more than half (55.6%) of the respondents indicating this level of use.  For those using open data, some of the most common open data sources include the City of Chicago’s open data portal, World Bank data, as well as federal sources like the Census Bureau.

Data sources graph

6. Data journalists work mostly with small data sets

There’s a broad range in the size of the data sets that the survey respondents work with – from files containing less than 100,000 records to more than 10 million records.  However, 59.3% of the data journalists surveyed indicate that, on average, the size of the data sets they work with are under 100,000 records.

Data volume graph