Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

August 10 2012

15:43

Two linked data journalism workshops for hacks and hackers

Following on from the sold-out Data Journalism Camp in 2011, DEN has combined forces with the MADE project to offer two linked workshops this autumn.

Download the flyer here
  • DJCAMP2012 with Paul Bradshaw and Megan Knight is aimed journalists who want to turn data into compelling stories and runs from 9:30am on Friday, September 21 to 5pm on Saturday, September 22 in the Media Factory  on the University of Central Lancashire's Preston Campus.
  • If you want to learn how to build your own data scraper and have at least a basic understanding of software development languages Ruby and / or Python, then there's a four-hour Scraping Masterclass with ScraperWiki founder Julian Todd from 9:30am to 1:30pm on Saturday, September 22, also in the Media Factory. 

Both DJCAMP2012 and the Scraping Masterclass are being co-sponsored by the MADE Project and the School of Journalism, Media and Communication at UCLan,

More information and registration details are available HERE

February 10 2011

17:07

Spot and Normalize Inconsistent Measures

Here’s an example of why you have to be very careful when scraping,
and why your normal run-of-the-mill technology that makes assumptions
won’t cut it:

One of our super-users, Julian Todd, decided to scrape the Vehicle Certification Agency (VCA) website on new car fuel consumption and exhaust emissions figures. And he spotted this:

And another search resulted in this:

Yes, that’s a change from milligrams per km to grams per km, noted
only in the header.

In ScraperWiki we can normalize this in standard python code:

for key in data.keys():
if key[-6:] == " mg km":
    nkey = key[:-6]+" g km"
    v = data.pop(key)
    if v == None:
        data[nkey] = None
    else:
        data[nkey] = float(v)/1000

This is from the scraper:
http://scraperwiki.com/scrapers/vca-car-fuel-data/

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl