Today a lot of companies, government institutes, researchers and other entities publish press releases with embedded data. Press releases contain graphics and some also contain tables with the underlying data. This is a good evolution. But if you want journalists to write about your numbers, you have to make their work as easy as possible. Here are some rules for publishing numbers.
The pdf plague
When I entered journalism some years ago, I found it striking that press releases in many cases are delivered in the form of pdf files. For a lot of reasons, this is a bad practice.
If you've done any text editing, you know what a mess copying text from a pdf file can be. This applies even more to tables in pdf files. In order to produce charts and graphics, you need your data in a clean spreadsheet or text file. But copying data out of a pdf table into, for example, an Excel sheet almost never results in an immediately usable table.
Sometimes you can clean up the copypasted data with some formulas, or in worst cases, with some manual data cleaning. Faster options however are using tabula and pdftable, tools designed specifically for extracting data out of pdf's.
But even these tools don't always succeed in getting clean data out of pdf's. So instead of putting the numbers in a table in a pdf, release them as an Excel-file or as a flat text-file, for example as a csv-file.
Clean data publishing
So releasing data in a spreadsheet or a text-file is rule number one. Here are the others.
- Send a spreadsheet file along with the press release. Alternatively you could link to a file on your website (especially if the file size is big), or deeplink to the data in your dataportal (if you have one).
- Give all the data, not only the data you think is the most interesting. You might be surprised by what insights others gain from your data and how they visualize it.
- If you make calculations (like for example percentages), also provide the raw numbers.
- Provide metadata (how the numbers where gathered and calculated, in what way the data is similar or different to other datasets, ...).
- Make graphics to illustrate the biggest stories contained in the data. You can use the Datavisualization Checklist to make clear graphics.
- If you want media to be creative with your data: give data in advance, under embargo if needed, so they can analyze, enrich and visualize your data.
- Provide the right attribution of the data.
- Send along the contact information of the person in charge of the data.
- Don’t merge cells. Sorting and other manipulations people may want to apply to your data assume that each cell belongs to one row and column.
- Don’t mix data and metadata (e.g. date of release, name of author) in the same sheet.
- The first row of a data sheet should contain column headers. None of these headers should be duplicates or blank. The column header should clearly indicate which units are used in that column, where this makes sense.
- The remaining rows should contain data, one datum per row. Don’t include aggregate statistics such as TOTAL or AVERAGE. You can put aggregate statistics in a separate sheet, if they are important.
- Numbers in cells should just be numbers. Don’t put commas in them, or stars after them, or anything else. If you need to add an annotation to some rows, use a separate column.
- Use standard identifiers: e.g. identify countries using ISO 3166 codes rather than names.
- Don’t use only colour or other stylistic cues to encode information. If you want to colour cells according to their value, use conditional formatting.
- Leave the cell blank if a value is not available.
- If you provide pivot tables, make sure the underlying data is available separately too.
- If you also want to create a human-friendly presentation of the data, do so by creating another sheet in the same workbook and referencing the appropriate cells in the canonical data sheet.
Spread the word
If you think these rules are useful and could make your work easier, please send them in reply to press releases not respecting these guidelines.