Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 10 2012

18:29

Meet us at the Computer-Assisted Reporting Conference

Our team is heading to St. Louis on Feb. 23 for the annual computer-assisted reporting conference, and we'd love to meet you. Come to our sessions or stop us in the hall to learn how we can help your newsroom and how to get involved. Read More »

October 04 2010

07:41

Where should an aspiring data journalist start?

In writing last week’s Guardian Data Blog piece on How to be a data journalist I asked various people involved in data journalism where they would recommend starting. The answers are so useful that I thought I’d publish them in full here.

The Telegraph’s Conrad Quilty-Harper:

Start reading:

http://www.google.com/reader/bundle/user%2F06076274130681848419%2Fbundle%2Fdatavizfeeds

Keep adding to your knowledge and follow other data journalists/people who work with data on Twitter.

Look for sources of data:

ONS stats release calendar is a good start http://www.statistics.gov.uk/hub/release-calendar/index.html Look at the Government data stores (Data.gov, Data.gov.uk, Data.london.gov.uk etc).

Check out What do they know, Freebase, Wikileaks, Manyeyes, Google Fusion charts.

Find out where hidden data is and try and get hold of it: private companies looking for publicity, under appreciated research departments, public bodies that release data but not in a granular form (e.g. Met Office).

Test out cleaning/visualisation tools:

You want to be able to collect data, clean it, visualise it and map it.

Obviously you need to know basic Excel skills (pivot tables are how journalists efficiently get headline numbers from big spreadsheets).

For publishing just use Google Spreadsheets graphs, or ManyEyes or Timetric. Google MyMaps coupled with http://batchgeo.com is a great beginner mapping combo.

Further on from that you want to try out Google Spreadsheets importURL service, Yahoo Pipes for cleaning data, Freebase Gridworks and Dabble DB.

More advanced stuff you want to figure out query language and be able to work with relational databases, Google BigQuery, Google Visualisation API (http://code.google.com/apis/charttools/), Google code playgrounds (http://code.google.com/apis/ajax/playground/?type=visualization#org_chart) and other Javascript tools. The advanced mapping equivalents are ArcGIS or GeoConcept, allowing you to query geographical data and find stories.

You could also learn some Ruby for building your own scrapers, or Python for ScraperWiki.

Get inspired:

Get the data behind some big data stories you admire, try and find a story, visualise it and blog about it. You’ll find that the whole process starts with the data, and your interpretation of it. That needs to be newsworthy/valuable.

Look to the past!

Edward Tufte’s work is very inspiring: http://www.edwardtufte.com/tufte/ His favourite data visualisation is from 1869! Or what about John Snow’s Cholera map? http://www.york.ac.uk/depts/maths/histstat/snow_map.htm

And for good luck here’s an assorted list of visualisation tutorials.

The Times’ Jonathan Richards

I’d say a couple of blogs.

Others that spring to mind are:

If people want more specific advice, tell them to come to the next London Hack/Hackers and track me down!

The Guardian’s Charles Arthur:

Obvious thing: find a story that will be best told through numbers. (I’m thinking about quizzing my local council about the effects of stopping free swimming for children. Obvious way forward: get numbers for number of children swimming before, during and after free swimming offer.)

If someone already has the skills for data journalism (which I’d put at (1) understanding statistics and relevance (2) understanding how to manipulate data (3) understanding how to make the data visual) the key, I’d say, is always being able to spot a story that can be told through data – and only makes sense that way, and where being able to manipulate the data is key to extracting the story. It’s like interviewing the data. Good interviewers know how to get what they want out from the conversation. Ditto good data journalists and their data.

The New York Times’ Aron Pilhofer:

I would start small, and start with something you already know and already do. And always, always, always remember that the goal here is journalism. There is a tendency to focus too much on the skills for the sake of skills, and not enough on how those skills help enable you to do better journalism. Be pragmatic about it, and resist the tendency to think you need to know everything about the techy stuff before you do anything — nothing could be further from the truth.

Less abstractly, I would start out learning some basic computer-assisted reporting skills and then moving from there as your interests/needs dictate. A lot of people see the programmer/journalism thing as distinct from computer-assisted reporting, but I don’t. I see it as a continuum. I see CAR as a “gateway drug” of sorts: Once you start working with small data sets using tools like Excel, Access, MySQL, etc., you’ll eventually hit limits of what you can do with macros and SQL.

Soon enough, you’ll want to be able to script certain things. You’ll want to get data from the web. You’ll want to do things you can only do using some kind of scripting language, and so it begins.

But again, the place to start isn’t thinking about all these technologies. The place to start is thinking about how these technologies can enable you to tell stories you otherwise would never be able to tell otherwise. And you should start small. Look for little things to start, and go from there.

January 23 2010

11:16

November 24 2009

06:30

November 12 2009

18:06
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl