Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.
13:36

Why the Government scraped itself

We wrote last month about Alphagov, the Cabinet Office’s prototype, more usable, central Government website. It made extensive use of ScraperWiki.

The question everyone asks – why was the Government scraping its own sites? Let’s take a look.

In total 56 scrapers were used. You can find them tagged “alphagov” on the ScraperWiki website. There are a few more not yet in use, making 66 in total. They were written by 14 different developers from both inside and outside the Alphagov team – more on that process another day.

The bulk of scrapers were there to migrate and combine content, like transcripts of ministerial speeches and details of government consultations. These were then imported into sections of alpha.gov.uk - speeches are here, and the consultations here.

This is the first time, that I know of, that the Government has organised a cross-government view of speeches and consultations. (Although third parties like TellThemWhatYouThink have covered similar ground before). This is vital to citizens who don’t fall into particular departmental categories, but want to track things based on topics that matter to them.

The rest of the scrapers were there to turn content into datasets. You need a dataset to make something more usable.

Two examples:

1. The list of DVLA driving test centres has been turned into the beginnings of a simple app to book a driving test. Compare to the original DfT site here.

2. The UK Bank Holiday data that ScraperWiki user Aubergene scraped last year was improved and used for the alpha.gov.uk Bank Holiday page.

It seems strange at first for a Government to scrape its own websites. It isn’t though. It lets them move quickly (agile!), and concentrate first on the important part – making the experience for citizens as good as possible.

And now, thanks to Alphagov using ScraperWiki, you can download and use all the data yourself – or repurpose the scraping scripts for something else.

Let us know if you do something with it!


Don't be the product, buy the product!

Schweinderl