Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 29 2013


June 21 2013


Data Science London 12th June – a speaker speaks

DS_LDN12JuneData Science London run an approximately monthly programme of evening events comprising short talks, beer and pizza. Last week I was invited to give a talk on Scraping and Parsing PDF using Python.

The venue for these events is the Westminster Hub in central London – we were diverted in our approach by the premier for Man of Steel in Leicester Square.

The audience was large, friendly and very diverse. Most, if not all of the audience, were highly technical. There were men in suits and ties, people with piercings, t-shirts and shorts. There were academics, web developers, economists, political science students.

There were four speakers on the evening:

  1. Rosaria Silipo from Knime presented on using their platform to process social discourse data from the Slashdot; analysing it for sentiment and for user roles in the community, the content from Slashdot acted as a substitute for content from a telecoms forum which their commercial clients were interested in.
  2. IanprofileI spoke on scraping and parsing PDF files, giving some details of the Python libraries we commonly use and illustrating with examples from my Royal Society membership list parsing and the verbatim records of the UN General Assembly and Security Council. I’ll write about this second project another time. The audience were very responsive (they laughed at my jokes) and there were some good questions at the end.
  3. Third up, after a brief pause for me to fetch a beer and wind down, was Doug Cutting – inventor of Hadoop who now works for Cloudera, he spoke about adding search capabilities to Lucene and Hadoop. I suspect he may have been the reason for the packed house.
  4. Finally Ian Oszwald from Mor Consulting spoke about brand name disambiguation for twitter i.e. knowing when someone is talking about Apple the brand or apple the fruit. There are tools for this type of problem but they appear to have been trained on longer form media and so do not perform well with short form sources such as twitter. Ian demonstrated an approach using the scikit-learn machine learning package for Python. This was a work in progress, for which he is looking for collaborators.

All in all a very enjoyable and interesting evening. I can heartily recommend Data Science London events if you get a chance to go.

Finally a big thank you to Carlos for organising such a great event.


Full ‘New Zealand’ House!

Sponsored post

May 27 2013


NetSquared Local: Week of May 27, 2013

NetSquared Local mpa

This is a big week for nonprofit tech folk who want to get together face-to-face. Saturday, June 1 is the National Day of Civic Hacking and Net2 organizers in Honolulu and Boston are jumping into the fray!

The Commonwealth Nations are also going big with events in Canada (Peterborough), New Zealand (Wellington), Australia (Melbourne) and the United Kingdom (Manchester).

read more

May 14 2013


Alexia Foundation Hosts Exhibition: Eyes on the World

Awa Balde, 5, cries after being circumcised. Once a girl passes through the rite of circumcision, she is considered a respectable prospect for marriage. A future husband will sometimes pay a dowry to claim a bride before she becomes a teenager. Photo by Ami Vitale courtesy of Alexia Foundation

The Alexia Foundation will host the opening night of “Eyes on the World” at 25CPW Gallery on June 20th, 2013 in New York, New York.

The evening will feature current work on the clothes factory collapse in Bangladesh, an intimate look at domestic violence, and 2011 Alexia Foundation grant award winner Amanda Berg’s story on teenage binge drinking entitled “Keg Stand Queens.”

Event Details

“Eyes on the World”
25CPW Gallery (located at 62nd Street and Central Park West)
Thursday, June 20th, 2013
6:00 to 9:00pm

“Eyes on the World” will be on display at 25CPW Gallery Friday – Sunday, June 21st to June 23rd, 2013.

A lecture by photographer Amanda Berg will be open to the public on Saturday, June 22nd, 2013 at 4:00pm.


Admission is free. Donations to support The Alexia Foundation are encouraged. Please RSVP to sarahbeth@alexiafoundation.org or 718-753-7607.

More Information

For more information, see the Alexia Foundation event page.

March 25 2013


Young Rewired State: 2013 Festival of Code

Guest post by Kaitlin Dunning from Young Rewired State

profile pic kait bw

Young Rewired State is a network of software developers and designers aged 18 and under. It is the philanthropic arm of Rewired State and its primary focus is to find and foster the young children and teenagers who are driven to teaching themselves how to code, how to program the world around them. The aim is to create a worldwide, independent, mentored network of young programmers supported  –  and supporting  each other  –  through peer-to-peer learning. Ultimately, young developers can be solving real-world challenges.

The Festival of Code is our annual celebration of everything code. It takes place all over the UK every year in the first full week of August, and ends with a long weekend at the Custard Factory in Birmingham, with everybody coming together showcasing the amazing achievements. This year, the dates are 5-11 August, and we’re aiming to have 60 centres around the UK, with 1000 kids participating!


Participating in the Festival of Code is the best way to get to know how we work and to become a part of the community  –  whether as a young person, mentor or host centre.
The mentor community is a huge part of the success of the Festival of Code. Traditionally, it has been drawn from the Rewired State network, but as the popularity of the week has grown, so has the mentor network. Indeed, some of you mentors will be YRS alumni that are aged over 18, and therefore too old to be a YRS participant. We hope that as the years go by, our mentor numbers will grow at the same rate as new attendees.

YRScircleThe role of the mentor is manifold, and includes: providing expertise in programming, design, presentation skills, agile, ideation, robotics, open data, open government data, and graphics. It also involves assisting the centre lead in looking after the room, alongside assessing skills and encouraging collaboration.

To help grow the mentor network, we would like to ask the ScraperWiki community if there are any people here interested in getting involved. If you are interested in learning how you can connect with Young Rewired State (as a centre, mentor, or sponsor) please email kait@rewiredstate.org.

Tags: events

September 05 2012


MediaStorm 2013 Workshop Dates Announced

Methodology photo

We are excited to be entering into our sixth training year at MediaStorm. Each year our workshops attract leading industry professionals looking to advance their multimedia and storytelling skills. We’ve now had more than 100 participants come through our professional workshops and we continue to be humbled by how much they take away from the experience.

John Temple, now managing editor at the Washington Post, told us that our Methodology Workshop gave him, “time to stop and step into a different way of seeing journalism.” He said, “I came away inspired by what’s possible if we commit to a different way of thinking about stories.”

After taking our Storytelling Workshop, Simon Schorno, head of media relations for the International Committee of the Red Cross (ICRC) in North America, had the following to say, “The passion of the entire MediaStorm crew for documentary storytelling, their professionalism, their willingness to share what they know and their commitment to help the team produce something we could all be proud of were outstanding.”

Each year our workshop participants remind us what an exciting time it is for our industry and how important it is to keep learning and innovating.

In 2013 we’ll be offering three Methodology, three Storytelling, and four One-day workshops at our studio in Brooklyn, NY. We are looking forward to another exciting, innovative and challenging training year. We hope you’ll be able to join us.

MediaStorm Workshop Dates 2013

MediaStorm provides intensive, hands-on educational experiences through our One-day, Methodology, Storytelling and Traveling Workshops. We’ll be offering the following courses in 2013:

January 12 One-day Workshop January 14-18 Methodology Workshop March 23-29 Storytelling Workshop April 20 One-day Workshop July 20-26 Storytelling Workshop August 12-16 Methodology Workshop September 21 One-day Workshop October 19 One-day Workshop November 2-8 Storytelling Workshop December 9-13 Methodology Workshop

Applications are now open. Apply now.

About Our Workshops

MediaStorm offers an array of in-person workshops and online training opportunities to meet your learning needs.

MediaStorm One-day Workshop
One-day overview session focused on the art of digital storytelling.

MediaStorm Methodology Workshop
This workshop is tailored to professionals who want to integrate MediaStorm methods into their curriculum or approach to storytelling.

MediaStorm Storytelling Workshop
Collaborate with a team to research, shoot and produce a documentary project in just one week. Work as a field reporter, editor or observer as part of crew dedicated to the telling of one story. See products from previous MediaStorm Storytelling Workshops.

Online Training
If you’re not able to join us in Brooklyn this year, consider signing up for a one-year subscription to our Online Training. Pay just one fee for more than six hours of video tutorials with MediaStorm staff on reporting and post-production.

We hope you can join us for another great year of workshops in 2013!

Fall 2012 Workshops

We have three remaining workshops in 2012:

October 20 One-day Workshop *Apply by Sept. 28 November 3-9 Storytelling Workshop *Apply by Sept. 12 December 10-14 Methodology Workshop *Apply by Nov. 9

Fall 2012 applications deadlines are approaching. Apply now.

Learn more about our upcoming 2012 and 2013 workshops at mediastorm.com/train.

September 03 2012


NetSquared September update

fall leafSeptember is here, and even 'tho the new year doesn't come for another four months I still think of September as a time of rebirth and renewal (I'll always think like a student, I guess.)

read more

August 10 2012


FotoVisura Pavilion in San Juan Announced

FotoVisura Pavilion San JuanFotoVisura is delighted to announce The 2012 FotoVisura Pavilion in San Juan, Puerto Rico. This four-day contemporary photography event celebrates international and local photography through exhibitions, panel talks, and a portfolio consultation and review.

The event will run from August 30 to September 2, 2012 and will feature exhibitions including “Beyond War: Mexico’s Drug War” by Guest Curator Whitney Johnson, director of photography at The New Yorker, “Falling Eyelids: A Fotonovela” written, directed and photographed by ADÁL, and “Interpolations” featuring Migdalia Luz Barens-Vera.

Exhibitions at the 2012 FotoVisura Pavilion

Beyond WarBeyond War: Mexico’s Drug War
Guest Curator Whitney Johnson, Director of Photography at The New Yorker

Featuring: Eunice Adorno (Mexico City, Mexico), Dominic Bracco II (Texas, US), Alejandro Cartagena (Santo Domingo, Domincan Republic), and Katie Orlinsky (New York, US).

Description: Description: In “Beyond War: Mexico’s Drug War” four young photographers share their perspectives on the ongoing conflict. Each looks beyond the graphic violence to examine the impact of the war on particular communities.

Falling EyelidsFalling Eyelids: A Fotonovela
Written, Directed and Photographed (1979) by ADÁL (Puerto Rico)

Description: “Falling Eyelids” tells the story of a photographer who dissatisfied with his surrounding reality begins to create his own. In time, the photographer is unable to distinguish the reality he invented from the one that was real.

Selected PortfolioSelected Portfolio (Travel Size)
Featuring: Jesús ‘Bubu’ Negrón (Puerto Rico)
Curated by: Jesús ‘Bubu’ Negrón & Roberto Paradise

Description: Selected Portfolio (Travel Size) by Jesús ‘Bubu’ Negrón is composed of 14 photographs laminated on wood that offer a breathtaking journey into the artist’s practice for the past 10 years.

See the full list of exhibitions and events at www.fotovisurapavilion.com.

Interested in participating in the FotoVisura Pavilion?

Contact info@fotovisura.com. Great discount rates are available with the Sheraton Puerto Rico Hotel and Casino and Delta Airlines.

More details available at www.fotovisurapavilion.com.

May 03 2012


Join Aday.org on May 15

On May 15th Aday.org asks you and people all around the world to pick up your cameras and picture what is close to you. In this unique photographic event, hundreds of thousands of people will work together to create a unique documentation of daily life.

Professionals, amateurs, school children, farmers, social media fans, astronauts and office workers. Cell phone camera, Hasselblad, homemade or borrowed. Aday.org is looking for the perspectives of everyone who enjoys photography. The goal is to inspire perspectives on humankind – today and tomorrow.

All images will be displayed online for you and everyone to explore. Some of them will be selected for a book, others will be displayed in digital exhibitions. Every single one will be saved for future research and inspiration.

Let a part of your life inspire generations to come. Share your perspective! Read more about the project and sign up to participate.

Take your photos on May 15th and upload them between 15th and 22nd May.

Go to Aday.org.

April 12 2012


Dart Society Benefit and Auction May 17

Dart Society Invitation

The Dart Society will be holding a benefit and auction at the gallery 25CPW in New York City on Thursday, May 17, beginning at 6 p.m. The benefit will honor Joel Simon of the Committee to Protect Journalists, photojournalist Lynsey Addario, and writer and documentarian Sebastian Junger for their commitment to excellence and compassion in the coverage of conflict, trauma and social injustice.

Tickets to the event, which will feature live and silent auctions of photography and books by Dart Society members and their colleagues, are $75 for one ticket and $100 for two. Auction items may be viewed here. Proceeds from ticket sales and auction items will be used to support the Dart Society’s outreach programs to journalists covering trauma and to support Dart Society Reports, which showcases the best work in trauma journalism.

Please click on the invitation above (or here) to purchase tickets or to make a donation; if you’d like to send a check, visit the Dart Society website.

January 17 2012


news:rewired – media in motion is now sold out, here is what delegates can look forward to

Tickets for news:rewired – media in motion have now sold out.

Essential information:

  • Time: 9am for registration, please arrive by 9.30 for the start of the conference. The final session will finish at 5.15pm, followed by networking drinks until 8pm.
  • Venue: MSN HQ, Cardinal Place, 100 Victoria Street, London SW1E 5JL – see a map and a picture of the easy-to-spot building.
  • Nearest tube: Victoria (victoria line, circle line and district line)
  • Hashtag: #newsrw
  • Packing list: Don’t forget to bring laptop and phone chargers

As tickets have now sold out, what treats are in store for delegates attending the digital journalism conference on Friday, 3 February?

The one-day conference on the latest trends in digital journalism will open with a keynote speech from Liz Heron, social media editor at the New York Times, who will give delegates a taster of social media strategy from across the pond, outlining how the title taps into social networks for newsgathering and community engagement.

The remainder of the day will feature a total of six sessions and three workshops for delegates to choose from. See the agenda for full details.

You can attend:

1A: Online video - with: Christian Heilmann, Mozilla Popcorn, @codepo8; Adam Westbrook, multimedia journalist, blogger and lecturer, @AdamWestbrook; Josh de la Mare, editor of video, Financial Times. More speakers to be announced.


1B: Paid-for content models – with: François Nel, researcher, academic and consultant on newsroom and digital business innovation, @francoisnel; Tom Standage, digital editor, the Economist, @tomstandage; Chris Newell, founder, ImpulsePay.

2A: Mobile reporting – with: Paul Gallagher, head of online content, the Manchester Evening News, @pdgallagher; Nick Martin, Sky News correspondent, @NickMartinSKY; Ben Fawkes, audio content manager, SoundCloud, @benfawkes; Christian Payne, social technologist, mobile story maker, @Documentally.


2B: Social media optimisation – with: Nate Lanxon, editor, Wired.co.uk, @NateLanxon; Chris Hamilton, social media editor, BBC News, @chrishams; Martin Belam, user experience lead, the Guardian, @currybet; Darren Waters, head of devices and social media, MSN UK, @darrenwaters.

Workshop A: Search engine optimisation skills – with: Malcolm Coles, digital production director, nationals, Trinity Mirror, @malcolmcoles.


Workshop B: Data journalism tools – with Simon Rogers, editor, Guardian Datablog and Datastore, @smfrogers, and Andy Cotgreave, senior product consultant, Tableau, @acotgreave.


Workshop C: Searching social media for news – with Nicola Hughes, Knight-Mozilla Fellow, the Guardian @DataMinerUK.

3A: Gaming mechanics in news – with: Bobby Schweizer, doctoral student at the Georgia Institute of Technology and co-author of Newsgames: Journalism at Play, @NewsgamesGT; Shannon Perkins, editor of interactive technologies, Wired.com; Al Trivino, director of innovation at News International, @alfredotrivino; Alastair Dant, interactive lead at the Guardian, @ajdant.


3B: Multiplatform stategy – with: Mike Goldsmith, editor-in-chief of iPad and tablet editions, Future Publishing, @mikegoldsmithDouglas Arellanes, technologist, consultant and the director of clients and services, Sourcefabric, @dougiegyro; the Guardian (speaker tbc). More speakers to be announced.

The final session will bring the whole conference together for a debate on setting social media standards – with: Laura Kuenssberg, business editor, ITV News, @ITVLauraK; Neal Mann, digital news editor, Sky News, @fieldproducer; Katherine Haddon, head of online, English, AFP, @khaddon; Tom McArthur, UK editor, Breakingnews.com, @TomMcArthur.

A drinks reception at the end of the conference will provide a chance to network.

December 01 2011


Seize the Data Day with Open Knowledge Foundation

On December 3rd at the Barbican Centre in London, the Open Knowledge Foundation will be inviting everyone and anyone who can wrangle up a good bit of data or wrangle the wranglers of data, to a Seize the Data event.

So they could do with some screen scraping, data extracting, coding extraordinaires (i.e. you).

So if you’re free and happen to be in London, please park your diggers outside their door and start turning the scraper cogs towards evil PDFs.

Sign up here.

September 19 2011


Help Get Olympic Data off the Start Line

As part of Media2012 we’ll be running (no pun intended) a Hacks and Hackers Data Journalism workshop.

It’s part of the Abandon Normal Devices Festival. It’ll be on 2nd October from 11:00-17:00 at FACT (Foundation for Art and Creative Technology) Medialab, 88 Wood Street, Liverpool, L1 4DQ.

So if you’re interested in sports data and want to see times, points and medal tables get off the line then come on down.

To book email hello@andfestival.org.uk

Most importantly, beer and pizza will be provided!

So watch out London 2012, you’re being ScraperWikied!

September 16 2011


Driving the Digger Down Under


Henare here from the OpenAustralia Foundation – Australia’s open data, open government and civic hacking charity. You might have heard that we were planning to have a hackfest here in Sydney last weekend. We decided to focus on writing new scrapers to add councils to our PlanningAlerts project that allows you to find out what is being built or knocked down in your local community. During the two afternoons over the weekend seven of us were able to write nineteen new scrapers, which covers an additional 1,823,124 Australiansa huge result.

There are a number of reasons why we chose to work on new scrapers for PlanningAlerts. ScraperWiki lowers the barrier of entry for new contributors by allowing them to get up and running quickly with no setup – just visit a web page. New scrapers are also relatively quick to write which is perfect for a hackfest over the weekend. And finally, because we have a number of working examples and ScraperWiki’s documentation, it’s conceivable that someone with no programming experience can come along and get started.

It’s also easy to support people writing scrapers in different programming languages using ScraperWiki. PlanningAlerts has always allowed people to write scrapers in whatever language they choose by using an intermediate XML format. With ScraperWiki this is even simpler because as far as our application is concerned it’s just a ScraperWiki scraper – it doesn’t even know what language the original scraper was written in.

Once someone has written a new scraper and formatted the data according to our needs, it’s a simple process for us to add it to our site. All they need to do is let us know, we add it to our list of planning authorities and then we automatically start to ask for the data daily using the ScraperWiki API.

Another issue is maintenance of these scrapers after the hackfest is over. Lots of volunteers only have the time to write a single scraper, maybe to support their local community. What happens when there’s an issue with that scraper but they’ve moved on? With ScraperWiki anyone can now pick up where they left off and fix the scraper – all without us ever having to get involved.

It was a really fun weekend and hopefully we’ll be doing this again some time. If you’ve got friends or family in Australia, don’t forget to tell them to sign up for PlanningAlerts.


OpenAustralia Foundation volunteer

July 25 2011


For the Texas Tribune, “events are journalism” — and money makers

Texas Tribune Festival logo

When Evan Smith helped launch the nonprofit Texas Tribune in 2009, he set out to get people engaged in their government again, especially in places where newspaper coverage has dwindled. The Tribune introduced blogs, multimedia, troves of government data, and something old-fashioned for an online news startup: face-to-face conversations.

The Tribune has hosted more than 60 public events — all free — attracting top influencers, big audiences, and hundreds of thousands of dollars in corporate sponsorships. Now the Tribune is blowing up the event and throwing The Texas Tribune Festival, a weekend of ideas for policy wonks, lobbyists, and anyone else invested enough in local government to pay $125 for a ticket.

“Events are journalism — events are content. And in this new world, content comes to you and you create it in many forms,” says Smith, the Tribune’s chief editor and chief executive.

One goal: to combat low levels of public engagement on a lot of the issues the event will address. “We think much of the technology world embraces ‘push’ as opposed to ‘pull’ as a way to reach people,” Smith says. “We are taking a ‘push’ approach to content, and that means going to people with content where they live.”

The speaker list includes top names in the universe of Texas politics: energy tycoon T. Boone Pickins, former U.S. Education Secretary Margaret Spellings, San Antonio Mayor Julián Castro. And the topics covered are also the Tribune’s core coverage areas: health and human services, energy and the environment, public and higher education, and race and immigration.

Evan Smith

If that all sounds familiar, it’s because the idea is modeled on the New Yorker Festival. In 2009, Smith hired the person who created that festival, Tanya Erlach, the former senior talent manager for The New Yorker. (“She’s not reinventing the wheel; this is her wheel,” Smith says.) Erlach handles everything from programming to logistics.

Smith is the first to admit that events don’t only produce journalism. They also produce revenue. And even the free events, including the TribLive speaker series, have been money-makers. They are cheap to produce, for one thing, and often underwritten by corporate sponsors. Smith estimates the Tribune raised about $650,000 in corporate support last year, which includes events. He expects to raise $1.3 million this year. While major gifts from philanthropists represented almost all of the Tribune’s revenue in 2009, Smith expects more financial diversity in 2011, with income from philanthropy, corporations, and events evenly split. Altogether, the Tribune has raised $9.3 million in barely two years — far more than like-minded nonprofit startups elsewhere.

“A lot of better established nonprofit news organizations — and I’m not counting the public broadcasting TV and radio stations but the sites that are similar to ours, ones that have been in existence longer — really have not approached the task of soliciting corporate support, underwriting, and sponsorships. We’ve just not seen other folks approach this, and they started to call us and ask us and our folks, you know, ‘How are you doing this?’”

If journalism is to survive, Smith says, business must be in the DNA. It’s in the Tribune’s DNA. Another Tribune co-founder was a venture capitalist, John Thornton, who initially raised $4 million in startup funding, including $1 million of his own cash and a large grant from the Knight Foundation. While Smith does not handle fundraising, he does reach out to executives personally to solicit their support.

Is Smith sheepish about that? “Hell, no.” Is there a conflict of interest? “Our only bias is in favor of Texas.” Public radio and television, he points out, rely heavily on corporate underwriting. The Tribune is neither paying people to speak at the festival nor covering their expenses. And the only reward for a corporate sponsorship is “a handshake and a tax letter,” he says.

“The work we do is important. And it needs to be paid for,” Smith explains. “There are appropriate sources of revenue out there. There is nothing to be ashamed of when putting a ‘for sale’ sign on as much stuff as possible, provided that it doesn’t have a negative impact on the work that you do or doesn’t create a negative perception of your integrity.”

Besides the financial value of the Tribune’s events, Smith says, there’s also value in the B word — you know, the word that tends to be uncomfortable in journalism circles. “Just as some other organizations may shrink from associating with corporate interests, there are some organizations, I suspect…that don’t fully appreciate the value of branding,” he says. A big festival is a platform for the Tribune to present itself as a grown-up operation, to build credibility and attract new readers.

Tickets went on sale July 11, with a discount for Texas Tribune contributors. Smith is working out a deal with sponsors to make admission free for college students.

July 21 2011


CIJ Workshop in Blog Form!

Here’s an introduction to a little thing called ScraperWiki. Please watch it so that you don’t develop bad habits that could annoy the programmers you want helping you!


  • Have your own ScraperWiki account and understand all the features and navigation of the site
  • Scrape twitter for different terms and export the dataset
  • Query datasets on ScraperWiki using SQL and learn about our API
  • Create a table from the data in a scraper and understand how views work
  • The main objective is to understand ScraperWiki capabilities and potential!

Exercise 0. Make an account

Go to http://scraperwiki.com and make an account. You’ll need this to save your scrapers. We encourage you to use your full name, but it’s not obligatory!  Before you start you may like to open a 2nd tab/window in your browser as that will allow you to go back and forth between the blog instructions and the exercise!   The exercises are all written using the Python Programming language.


You’ll have your very own ScraperWiki account (lucky you!). Welcome to your dashboard. Please fill out your profile when you have time. You’ll automatically get email updates on how your scrapers are doing. We’ll show you all the features you have, where your data goes and how to schedule your scraper.

The following scrapers you’ll be copying and pasting can be found on my profile page: http://scraperwiki.com/profiles/NicolaHughes/

Exercise 1. Basic data scraping

We’ll start by looking at a scraper that collects results from Twitter. You can use this to store tweets on a topic of your choice into a ScraperWiki database.

1. Edit and run a scraper

  1. Go to http://scraperwiki.com/scrapers/basic_twitter_scraper_2/.
  2. Click Fork Scraper. You’ve now created a copy of the scraper. Click Save Scraper located on the right side of the window to save the scraper.
  3. Position your cursor on Line 11 between the two open speech marks ‘ ‘ -  This will allow you to edit the query to a Twitter search term of your choice.  For example, if you want to find tweets about an event type in the event hashtag e.g. QUERY=’#cij2011’.
  4. Click Run and watch the scraper run.
  5. Save the scraper. Click Save Scraper located on the right side of the window to save the scraper.

2. Download your scraped data

  1. On the top right-hand corner there are 4 tabs – you are in the Edit (Python) tab, click the Scraper tab.  You will see all of your data in tabular format.
  2. Click on ‘Download spreadsheet (CSV)’ to download and open your data as a spreadsheet.  If you have Microsoft Excel or Open Office you can analyse that data using standard spreadsheet functions.
  3. Double-click on your spreadsheet to open it.


Congratulations! You have created your very own twitter scraper (*applause*) that doesn’t depend on the twitchy twitter API.  You’ve scraped your data and added it to the data store. You’ve also taken the data into a csv file.  The scraper is scheduled to run daily and any new data will be added automatically. Check out the video on how to change the schedule.

Exercise 2. Analysing some real-life data

In this exercise, we’ll look at a real-life use of ScraperWiki.

The Press Complaints Commission doesn’t release the details of the complaints in a way that is easy to analyse, and it doesn’t release many statistics about its complaints.

It would be interesting to know which newspaper was the most complained about.

However, as one of our users has scraped the data from the PCC site it can be analysed – and crucially, other people can see the data too, without having to write their own scrapers.

Here is the link: http://scraperwiki.com/scrapers/pcc-decisions-mark-3/

There’s no need to fork it. You can analyse the data from any scraper, not just your own.

As the Open Knowledge Foundation say, ‘The best use of your data is one that someone else will find’.

Instead, we are going to create a new view on the data.  However we are not going to create a view from the beginning – we will fork a view that has already been created!  Go to http://scraperwiki.com/views/live_sql_query_view_3/, fork the view and save it to your dashboard.  You can also change the name of the view by clicking on the top line beside your name “Live SQL Query View” and change it to “My analysis of the PCC Data”.  Save it again by clicking ‘Save’.  Take a few moments to study the table and pay particular attention to the column headings.

  1. There are four tabs on the top right hand corner of the screen – click the ‘View’ tab.  This will take you to the ScraperWiki test card.  Click the centre of the test card “Click to open this view”.
  2. Using this SQL query view, find out which publications get the most complaints.
  3. Place the cursor in the ‘SELECT’ box, delete the ‘*’ and click on the word ‘publication’ which appears on the 2nd line of the yellow box (tip: the yellow box contains all of the column headings in the table) to the right of the column, and is positioned on the line under swdata .   This will transfer the word  ‘publication’ into the  your SELECT box.  Position your cursor to the right of the word ‘publication’ and type in ‘, count(publication)’. This creates a column that contains the a count of the number of times a publication appears in the dataset that was created from by the original scraper.
  4. Place your cursor in the ‘GROUP BY’ box, and repeat the process above to select the word ‘publication’ from the yellow box to the right. The GROUP BY statement will group together the publications and so give you the aggregated number of times each publication has appeared in the PCC.
  5. Place your cursor in the ‘ORDER BY’ box, remove the existing text and then type ‘count(publication) desc’. The ORDER BY keyword is used to sort the result-set. This will put the result-set in order of how many times the publication has received a complaint with the most complained about publication appearing at the top.
  6. In the ‘LIMIT’ box type ‘10’ to see the top 10.
  7. Hit the ‘Run SQL query’ button and see your results.  At the bottom of the column the last item is ‘full query’ – yours should read as follows:  (it may be truncated)

SELECT publication, count(publication) FROM swdata GROUP BY publication ORDER BY count(publication) DESC LIMIT 10

It should look like this!

This is a simple query showing you a quick result.  The query is not being saved.   You can use the Back and Forward browser tabs to see both screens however as soon as you go back to the test card and “Click to see view’  – the query will be reset.

For some really great SQL tutorials check out: http://www.w3schools.com/sql/default.asp

Your Challenge:

Now lets find out who has been making the most complaints – not receiving them.  You will be altering the query to find out which complainants have made the most complaints.


So you have just learned a bit about SQL, which is very simple and very powerful. You can query any open data set in ScraperWiki to find the story. Welcome to the wonderful realm of data journalism. Hop aboard our digger and explore.

Exercise 3. Making views out of data

Making a view is a way to present raw data within ScraperWiki.  It allows you to do things with the data like analyse it, map it, or create some other way of showing the data.

For example, in the previous exercise we used the view to create a simple SQL Query.  The view produced a table but it did not save the results of the query.    In this exercise we are going to make a table that gives you the latest result of your query every time you open it up!  In effect you will be creating your own live league table.

  1. Fork and save to your dashboard: http://scraperwiki.com/views/top_10_receivers_of_cabinet_office_money/
  2. On line 59 change “cabinet_office_spend_data” to “pcc-decisions-mark-3”. This is re-directing the view code to the original PCC scraper.
  3. On line 60 change the var sqlselect = “ SELECT publication, count(publication) FROM swdata GROUP BY publication ORDER BY count(publication) DESC LIMIT 10 ” query to the SQL query you worked out in the previous exercise.
  4. Click the ‘PREVIEW’ button which is positioned to the right of the orange documentation button above the ‘console’ window to make sure you’re getting the table you want.
  5. Click the X on the top right hand corner of the View Preview Window.
  6. The Heading still refers to the old scraper!  Go to the edit view and click on line 21 and replace the old title with ‘Top Complaints by Publication to the PCC’
  7. Save the View and Preview.
  8. Go to line 18 and replace http://scraperwiki.com/scrapers/cabinet_office_spend_data/”>Cabinet Office Spend Data</a> with http://scraperwiki.com/scrapers/pcc-decisions-mark-3/”>Press Complaints Commission Data</a>. You can also change the ‘this page’ hyperlink from <a href=”http://www.cabinetoffice.gov.uk/resource-library/cabinet-office-spend-data”> to <a href=”http://www.pcc.org.uk/”>.
  9. Save the view and preview.
  10. Click View to return to the test card and scroll to the bottom of the screen where you will see the paragraph heading – ‘Add this view to your web site’.   You could copy the code and add it to your web site and have the data linked directly. If you don’t have a site, this view is where you can return to get your live league table that will update with your data (wow)!
ScraperWiki Testcard

Note: there is no data stored in a view, only a link to a scraper that is looking at the data in the data store. This is why you can quickly and easily alter a view to look at another scraper at any time.  So you van build a viewer once and use it many times.

Check out what the Media Standards Trust made from our PCC scrapers! They made the Unofficial PCC.

Your Challenge:

Build a table of the top 10 receivers of Cabinet Office Money! The original scraper here: http://scraperwiki.com/scrapers/cabinet_office_spend_data/.

So go to your SQL viewer and change the scraper name to “cabinet_office_spend_data” i.e. the name of the scraper for our internal API is the string of letters after ‘scrapers/’ in the URL of the scraper.

Create your query (Hint: you’ll want to look at the ‘Refined’ table as that has the names of the suppliers cleaned to be the same for each spelling, just hit the word ‘Refined’ and the table that is selected for your query will appear in yellow. You’ll also want to use ‘sum’ instead of ‘count’ to sum up the money not how many times the supplier got paid). Then make your table view.

If you want to keep your PCC table just fork it, change the name in the title, save it and change the code to look at “cabinet_office_spend_data” and your query.

Hint: the original table you forked will have the answers!

Also check out this view of the same data set: http://scraperwikiviews.com/run/cabinet_spending_word_cloud_date_slider/

And this view of HMRC spending data that was made in a couple of hours from forking my code: http://scraperwikiviews.com/run/hmrc_spending_pie_chart_date_slider/


You’ll have a live league table that feeds off your scraped data! You’ll see how our setup allows you to constantly view changing data on the web and tell you the stories as they are developing. So you can get data out but you can keep it in and have lots of different views interrogating your data within ScraperWiki.

So let’s work on the journalism in ‘data journalism’ and not just the data. It’s not what can do for your data; it’s what your data can do for you!

So let’s start scraping with ScraperWiki!

Note: Answers in screencast form will be put on the blog in two weeks’ time. So no excuses!

July 11 2011


Investigative Journalism, ScraperWiki style!

Just to let all hacks and journo hackers out there know, we’ll be running a ScraperWiki workshop at the Centre for Investigative Journalism Summer School. So get tickets whilst you can! The lineup looks amazing and the ScraperWiki gurus Nicola Hughes (@DataMinerUK) and Anna Powell-Smith (@darkgreener) will be at the summer school the whole time pottering around so do grab us.

We’ll be driving our digger for nearly 2½ hours on the Saturday (16th July). There’ll be a pit stop in between. You’ll get loads from the first session if that’s all the time you can give. Attending the first session will be mandatory for the second session. We would recommend you attend both to feel the full force of the ScraperWiki digger. You will be chauffeur driven however if you want driving lessons after the tour we’ll see if we can organize a two day workshop.

We hope to see you on board!

July 05 2011


Awesome Foundation Seattle Community Meeting

On Thursday, June 30, 20+ Seattle residents came out for the first local Awesome Foundation community meeting. On behalf of my co-organizer, Tommer Peterson and myself, a big thanks for those who joined us.

We met coders, artists, activists, co-working enthusiasts, and at least one roboticist – a truly awesome mix.  Lots of people couldn’t attend, but wanted to get involved, so here’s our follow up post, as promised.

Read on for:

  • A quick overview of the meeting
  • Information about next steps
  • Notes from the Q&A session

If you already know for sure that you want to get involved and weren’t at the meeting, head over to our Awesome Foundation Commitment Form.  If you are new to Awesome Foundation Seattle, you can read our initial invitation post and my personal note about why I’m psyched to get AF Sea started.

Our Proposal

Tommer welcomes participants

After a getting-to-know you warmup, Tommer introduced the basic Awesome Foundation concept – 10 people (aka “Trustees”) giving $100 and collectively sharing a $1000 grant to the most awesome proposal each month. Awesome Foundation chapters are sprouting all over the world, and Seattle will probably be the 20th chapter.

I talked about a vision for community engagement beyond the basic giving model.  Once we get good at making grants, I’d love to discuss an awesome group blog for Seattle, highlighting everything that makes the city great and helping to identify potential grant applicants.  Maybe we could have awesome volunteer days or give a larger, special grant once a year.

There’s a lot of potential, and our direction will be determined collectively by those who get involved early.

To build that broader engagement, we want to shake up the basic model a bit.  Tommer and I proposed 4 basic participation levels:

  1. Full-time Trustees: people who can make the $100/mo commitment for the first consecutive 6 months. This group will form the foundation of the foundation, make the first critical decisions about how the chapter will operate, and review grant applications each month.
  2. Guest Trustees: for folks who want to participate at a lower financial level.  Guest Trustees join the full-time Trustees for at least 1 month (or more) out of the first 6 and review grant applications in those months when they are making a contribution.
  3. Friends of Awesome: aka “Volunteers!” A number of folks have expressed support of the Awesome Foundation idea, but are not able to participate financially. We do need volunteers in several capacities. Let us know if you would like to help design, build and manage our local WordPress blog; organize events; and/or support our efforts to publicize grant opportunities
  4. Grant Applicants: The all-important piece of the puzzle.  We’ll always be looking for fresh, exciting proposals.

Next steps

A mingling of awesome

After the post-meeting mingle, everyone filled out a form indicating their level of commitment. What’s next?

Step 1) If you missed the meeting and want to get involved, it’s very important that you fill out the online Awesome Foundation Commitment Form.  Please fill it out by Monday, June 11.

Step 2) Tommer and I will take all of the input from the paper and online forms and do our best to put together a great mix of full-time and guest Trustees.  We’ll send invitations to join that firs group and take final confirmations.

Step 3) Within a couple of weeks, we’ll announce our first group of Trustees and a calendar for future guest Trustees.

Step 4) Trustees will convene to decide and announce our grant-making calendar.

Step 5) The awesome commences – taking applications and making grants by the end of the summer.


Participants had lots of questions.  Tommer and I want to make sure everyone understands that we don’t hold all the answers.  Instead, we’ll be looking to our fellow Trustees and Friends of Awesome to guide the way as we get started in Seattle.

Q: What’s the mission statement of Awesome Foundation? What kind of work are you looking to fund?

A: Unlike most initiatives, AF doesn’t have a tight focus on any particular area of work.  Grants from other chapters have focused on the arts, technology, and fun community engagement.  In fact, there’s a new international chapter focused on Food.  The mission of Awesome Foundation Seattle will be as broad and deep as our the imaginations of our Trustees, Friends and Applicants allow.  You can read the shared mission statement here and scan grants that have been given in other cities on the shared blog.

Q: What’s the decision-making process – quorum? majority of trustees? does it need to be unanimous?

A: Every chapter is free to choose its own process.  There is a draft Trustee manual that lays out decision models from several cities, and the first Seattle Trustees will have to decide how to decide.

Q: Would grant applicants be encouraged to reapply?

A: Yes!  Based on the experience of other chapters, we will want to stay in touch with applicants who don’t receive a grant in any given month and encourage them to keep old and new proposals flowing.

Q: Will there be parties?

A: Absolutely!  As often as we can, we’ll want to celebrate our grantees and invite more people to meet and mix with us to keep the awesome growing.


July 04 2011


We Eat Data – ScraperWiki talk at Open Knowledge Conference 2011

Our tamed computer programmer, ‘The Julian’, recently gave a rare appearance at the Open Knowledge Conference in Berlin (if you want an appearance pay us or ask us!). The spectacle of such scraping royalty drew more people than the room could accommodate (‘The Julian’ is not related to any royals living or deceased). As such I have included the slides here:

We were honoured to be amongst an outstanding line-up of speakers. We also ran a workshop the week of the conference and you can see the German data we scraped into ScraperWiki on the OKCon2011 tag.

What was most interesting about the workshop is that we see the same types of data needed for similar projects wherever we go. Tobias Escher wants to do something similar to AlphaGov for Germany called Meine Demokratie. A lot of very simple little scrapers can go a long way and if there’s anyone looking to play around with scraping and ScraperWiki, or who would like to lend a coding hand to a worthy cause please to click the above link.

‘The Julian’ was also looking for a scraping challenge and the workshop gnomes found Berlin schools data. I showed those in attendance one of my favourite sites made from scrapers:  Schooloscope. So Julian is scraping the data for Berlin schools in various stages and the hope is to get all the data for schools in Germany to make a German schooloscope.

We have one lovely lady very interested in getting this project on its way so if you are willing, if you speak German and if you know where to find them maybe you can scrape German schools data.

So watch out useful things to know in Germany including schools – you’re being ScraperWikied!

(As ScraperWiki is being used for better and better things, this will just get harder for me…)

June 15 2011


Five ways we're trying to build the scope of our Net2 Local community in Manchester, UK

Over the past 18 months I've been organising the Manchester Net Tuesday group.  We've had speakers, discussions and collaborations on a number of issues relevant to the non-profit sector locally, from online fundraising to search engine optimisation to activism & campaigning - a really rewarding experience.

One thing we've struggled with is how to extend our group beyond the "usual suspects" of people already involved in the non-profit technology sector.  As much as we value our collective contributions, we all recognise the need to reach out to those in the sector that *really* need to take advantage of technology, yet might be reticent to do so.

So - on the 28th June, we are holding an event that represents a few departures to our norm.  I wanted to share five tactics we've taken to try and build the attendance of this event, based on previous experiences both in #mcrn2 and beyond.  Ultimately, we are trying to extend the scope of our membership. to build a wider platform for change. 

1. A speaker from London!

This month, we have a guest speaker from The Charity Technology Trust, a TechSoup partner.  Granted, it's only two hours by train from Manchester to London, but billing the event as a "one-off" chance to hear and meet someone from an organisation not-based in the city, does provide some impetus. 

Conversely however, some of our most productive meetups have been with very locally focused topics and case studies - so this is a constant dilemma to consider.

2. The time, the place

Normally, we meet at a hackspace venue on a Tuesday evening.  This has suited the focus of the group fine, but our June event will be held during an afternoon, at a more "mainstream" venue (kindly donated by the local digital development agency).  Again, signups have been strong.  The point here is that to engage people into our group, it may often be a good tactic to reach them in their own context to begin with!  This doesn’t mean we would move away from our regular home, but it seems useful to get out and about.

3. Mailing Lists!

Blackpool attendeesI recently attended the first non-profit technology event in Blackpool (about 1 hour from Manchester).  It was similar to Net2 Local in approach and design (maybe a new member?) - but one thing struck me: the wide turnout from local non-profits and community groups.  Duncan and Lillian (the organisers) told me about the value of local email and offline mailing lists that had long been established and maintained for communications with the sector.  In other words, Twitter wasn't the answer, when the target group didn't use Twitter! As with the point of our event timing, it is all about targeting

4. Waiting Lists!

Another first was to use Eventbrite for signups, and include the function for Waiting Lists.  Recently, I missed out on the NotforProfit TweetUp in London, as I hadn't been quick enough to grab a ticket!  So, whilst not wanting to force people into a secondary market for tickets, we took the option to utilise Eventbrite in terms of administering the event - with lots of people signing up.

5. Social Media Surgeons

The final aspect we are trying is that of having a few "social media surgeons" on hand to offer one-to-one advice and support to participants, borrowing from the Social Media Surgery movement that is well established in the UK.  With our first four tactics seemingly working to engage a new audience, it will be vital that we offer some reason to come back.  A great aspect here is that some of the “usual suspects” I mentioned will be providing the surgery!

And so - there goes our five tactics for extending the scope of our community.  I'll blog again post-event and onwards, but wanted to share these actions so far. Conveniently, I've (nearly) found a word beginning with T to describe each:

  •     Topic
  •     Timing
  •     Targeting
  •     Tickets
  •     Tech Support

Ultimately, we want people to come to Manchester Net Tuesday regularly, and even add ideas and inspirations online.  Building on the foundations of the last 18 months, I'm hoping we can achieve this.

Ill stress that this is just something we are trying in Manchester, and it would be great to hear more from others.  What tactics have you taken to widen the scope of your group, if at all?  How do you engage people further? 

Post a comment here, or maybe continue the conversation via @stevieflow and @mcrn2

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...