Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

November 24 2010

19:11

Hacks/Hackers London

First of all, the Iraq War Logs:

Round One – The Cleaning

Documents, records and words all hugely intimidating in their vastness. But some tools to help are MySQL, Ultraedit and Google Refine. But this stage is incredibly frustrating.

Round Two – The Problem

How do you tackle the types of documents? There was even small PDF files. Had to build a basic web interface for everyday queries. Needed multiple fields, this part is extremely difficult. Especially when you need to explain it to an editor. You have to have a healthy mistrust of the data. Asking the right questions is crucial. Asking something which the data is not structured to ask is the real problem.

Round Three – What We Did

Looked at key incidences and names of interest which the media had previously reported. Trick was to try and find what we didn’t know. First start by looking at categories of deaths by time. Found that it was murders rather than weapons fire that killed the most. It was the own civilian in-fighting. Use Tableau. Up to 100,000 records. Also had to get researches to sift through reports and manually verify what the data meant. Make sure if you do that that you organise a system that everyone uses to categories, calculate and tabulate. Can then use Exel and filter. Quicker with Access.

Data was used as part of research not just to make loads of charts. Visual maps tell a story. Quite powerful to an audience. Maps can be used for newsgathering. Asked journalists which areas they were interested in and sent them the reports geocoded. They could read up on the plane all the reports in the area they were heading to. Can also link a big story to it’s log. Prove it to be true. The log can validate a report, so you can use it.

What Did it Take?

10 week. 25 people. 30,000 reports. 5,000 reports manually recounted. More than one 18-hour day.

ScraperWiki

A lot of really useful information is not easily available on the web. Writing a web scraper not only makes searching and viewing information better but it can bring stories to light which were hidden in the mass of digital structures.

Don't be the product, buy the product!

Schweinderl