Journalists need to "know about" data journalism

February 14, 2014

This article was written in preparation of the 'How data journalism affects the newsroom' session of the DataDays 2014, February 17, Ghent.

As data journalism is growing fast and is already maturing in some parts of the world (mostly in the Americas), it is lagging behind in other regions. This is partly due to the language pool effect: big language pools (mainly English and Spanish) allow for big newsrooms. And only big newsrooms can afford to make the necessary investments to make data journalism an integral part of their business.

But there is a lot of potential too for data journalism in smaller newsrooms in smaller language pools, like the newsrooms in Flanders, Wallonia and other regions and countries.

We can't expect medium sized and small newsrooms to suddenly start contracting coders, visual designers and data geeks. Neither can we expect journalists working in those newsrooms to acquire all the skills necesarry to produce and publish data stories. So those newsrooms shouldn't aim at "big data journalism", they should aim for something smaller.

These are the things I think are necesarry for incorporating data journalism in smaller newsrooms.

Knowing about

In any case, journalists should get rid of their phobia for everything related to numbers, statistics and databases. A course in basic Excel functionality is an excellent first step (and by the way: did you know Excel is also a good tool for making visualizations?)

I stress the word basic here, because I think journalists shouldn't become experts in any of the disciplines of data journalism. However, they should become aware of what data is, how it is structured, gathered, stored and edited.

They should know what tools exist and what each tool can do. They shouldn't learn to operate all the data tools: a basic tool like Excel can get you quite far. But knowing what is out there and what the possibilities of each tool are, will help journalists realize how data journalism can improve their stories and give them leads to more original stories.

If a journalist would like to publish an interactive bar chart within an article, he should know about Datawrapper.

If a journalist has a list of addresses, he needs to know about Google Fusion Tables, which can turn the addresses into dots on a map.

If a journalist wants to get data out of an online database, he should know about scraping and about the tools you can use to get the data out.

He should know about csv-files, url parameters, Excel formulas, pivot tables, Open Refine, integers and strings, treemaps, slopegraphs and maybe even D3.js.

He should not be an expert or daily user of any of these things. He just needs to know about them, know these things exist and have an understanding (vague or not) of what they do and how they can help him in his work.

When he feels the need, the journalist can dive a little deeper into a subject or a tool (read documentation, experiment, learn from examples) and deepen his data skills. Willingness to learn is important here.

When things become too complicated, he should be able to explain to others, in technical terms, what it is he would like to accomplish. Newsrooms should in this case contract freelancers and consultants in the field of data and visualization: those are popping up at a regular pace now. These "datafreelancers" can perform scraping, cleaning, wrangling and visualization of data, according to the well described needs of the newsroom.

"Knowing about" applies to data itself and to data tools, but it also applies to data sources: journalists need to know about where to get their data from and how to search for data. They don't need to be daily users of data platforms, but they have to know what platforms exists and what kind of data they serve.

How to "know about"

How do we get journalistst to "know about" data and data journalism? There is a lot we can do.

First, like I wrote before, a course in Microsoft Excel is a good start for journalists.

Second is learning from the masters. Data journalism is a young, hip and dynamic discipline, with a lot of media publishing innovative pieces. A lot of big media have datablogs these days, on which they publish their work, the tools they work with, workflows, how-to's and more. A lot of the new datafreelancers also maintain a blog on their website (like this one here).

Having someone with a less journalistic and more datarelated background in the newsroom could be a big win. When hiring new people, ask for their data skills (or more importantly: their willingness and capabilities to pick up these skills).

A big role here is to be played by the education institutes offering journalism education. Every journalism student graduating should at least know some Excel and should have some understanding of data, data tools, data visualization and data stories. Giving the students lots of examples and having them evaluate these examples is a start, letting them produce actual data stories is, off course, better.

Availability and accessibility of more good quality (open) data will encourage journalists and newsrooms to invest in data skills.

But the most important thing is to conquer the data fear and just start doing data journalism. Start with small datasets, simple tools and simple visualizations like bar and line charts and build from there to bigger datasets, maps, interactive charts, more advanced tools and more exotic visualizations.

Know about and know the language

Doing this, you'll run into problems almost immediately and you'll feel like your stuck. But don't get discouraged. It is almost certain that someone ran into the same problem before. You only have to know about some tool that you think can solve your problem and search for a solution, using the right technical terms.