Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 15 2013

10:38
10:38

August 02 2012

14:18

A case study in online journalism: investigating the Olympic torch relay

Torch relay places infographic by Caroline Beavon

For the last two months I’ve been involved in an investigation which has used almost every technique in the online journalism toolbox. From its beginnings in data journalism, through collaboration, community management and SEO to ‘passive-aggressive’ newsgathering,  verification and ebook publishing, it’s been a fascinating case study in such a range of ways I’m going to struggle to get them all down.

But I’m going to try.

Data journalism: scraping the Olympic torch relay

The investigation began with the scraping of the official torchbearer website. It’s important to emphasise that this piece of data journalism didn’t take place in isolation – in fact, it was while working with Help Me Investigate the Olympics‘s Jennifer Jones (coordinator for#media2012, the first citizen media network for the Olympic Games) and others that I stumbled across the torchbearer data. So networks and community are important here (more later).

Indeed, it turned out that the site couldn’t be scraped through a ‘normal’ scraper, and it was the community of the Scraperwiki site – specifically Zarino Zappia – who helped solve the problem and get a scraper working. Without both of those sets of relationships – with the citizen media network and with the developer community on Scraperwiki – this might never have got off the ground.

But it was also important to see the potential newsworthiness in that particular part of the site. Human stories were at the heart of the torch relay – not numbers. Local pride and curiosity was here – a key ingredient of any local newspaper. There were the promises made by its organisers – had they been kept?

The hunch proved correct – this dataset would just keep on giving stories.

The scraper grabbed details on around 6,000 torchbearers. I was curious why more weren’t listed – yes, there were supposed to be around 800 invitations to high profile torchbearers including celebrities, who might reasonably be expected to be omitted at least until they carried the torch – but that still left over 1,000.

I’ve written a bit more about the scraping and data analysis process for The Guardian and the Telegraph data blog. In a nutshell, here are some of the processes used:

  • Overview (pivot table): where do most come from? What’s the age distribution?
  • Focus on details in the overview: what’s the most surprising hometown in the top 5 or 10? Who’s oldest and youngest? What about the biggest source outside the UK?
  • Start asking questions of the data based on what we know it should look like – and hunches
  • Don’t get distracted – pick a focus and build around it.

This last point is notable. As I looked for mentions of Olympic sponsors in nomination stories, I started to build up subsets of the data: a dozen people who mentioned BP, two who mentioned ArcelorMittal (the CEO and his son), and so on. Each was interesting in its own way – but where should you invest your efforts?

One story had already caught my eye: it was written in the first person and talked about having been “engaged in the business of sport”. It was hardly inspirational. As it mentioned adidas, I focused on the adidas subset, and found that the same story was used by a further six people – a third of all of those who mentioned the company.

Clearly, all seven people hadn’t written the same story individually, so something was odd here. And that made this more than a ‘rotten apple’ story, but something potentially systemic.

Signals

While the data was interesting in itself, it was important to treat it as a set of signals to potentially more interesting exploration. Seven torchbearers having the same story was one of those signals. Mentions of corporate sponsors was another.

But there were many others too.

That initial scouring of the data had identified a number of people carrying the torch who held executive positions at sponsors and their commercial partners. The Guardian, The Independent and The Daily Mail were among the first to report on the story.

I wondered if the details of any of those corporate torchbearers might have been taken off off the site afterwards. And indeed they had: seven disappeared entirely (many still had a profile if you typed in the URL directly - but could not be found through search or browsing), and a further two had had their stories removed.

Now, every time I scraped details from the site I looked for those who had disappeared since the last scrape, and those that had been added late.

One, for example – who shared a name with a very senior figure at one of the sponsors – appeared just once before disappearing four days later. I wouldn’t have spotted them if they – or someone else – hadn’t been so keen on removing their name.

Another time, I noticed that a new torchbearer had been added to the list with the same story as the 7 adidas torchbearers. He turned out to be the Group Chief Executive of the country’s largest catalogue retailer, providing “continuing evidence that adidas ignored LOCOG guidance not to nominate executives.”

Meanwhile, the number of torchbearers running without any nomination story went from just 2.7% in the first scrape of 6,056 torchbearers, to 7.2% of 6,891 torchbearers in the last week, and 8.1% of all torchbearers – including those who had appeared and then disappeared – who had appeared between the two dates.

Many were celebrities or sportspeople where perhaps someone had taken the decision that they ‘needed no introduction’. But many also turned out to be corporate torchbearers.

By early July the numbers of these ‘mystery torchbearers’ had reached 500 and, having only identified a fifth, we published them through The Guardian datablog.

There were other signals, too, where knowing the way the torch relay operated helped.

For example, logistics meant that overseas torchbearers often carried the torch in the same location. This led to a cluster of Chinese torchbearers in Stansted, Hungarians in Dorset, Germans in Brighton, Americans in Oxford and Russians in North Wales.

As many corporate torchbearers were also based overseas, this helped narrow the search, with Germany’s corporate torchbearers in particular leading to an article in Der Tagesspiegel.

I also had the idea to total up how many torchbearers appeared each day, to identify days when details on unusually high numbers of torchbearers were missing – thanks to Adrian Short – but it became apparent that variation due to other factors such as weekends and the Jubilee made this worthless.

However, the percentage per day missing stories did help (visualised below by Caroline Beavon), as this also helped identify days when large numbers of overseas torchbearers were carrying the torch. I cross-referenced this with the ‘mystery torchbearer’ spreadsheet to see how many had already been checked, and which days still needed attention.

Daily totals - bar chart

But the data was just the beginning. In the second part of this case study, I’ll talk about the verification process.

14:18

A case study in online journalism: investigating the Olympic torch relay

Torch relay places infographic by Caroline Beavon

For the last two months I’ve been involved in an investigation which has used almost every technique in the online journalism toolbox. From its beginnings in data journalism, through collaboration, community management and SEO to ‘passive-aggressive’ newsgathering,  verification and ebook publishing, it’s been a fascinating case study in such a range of ways I’m going to struggle to get them all down.

But I’m going to try.

Data journalism: scraping the Olympic torch relay

The investigation began with the scraping of the official torchbearer website. It’s important to emphasise that this piece of data journalism didn’t take place in isolation – in fact, it was while working with Help Me Investigate the Olympics‘s Jennifer Jones (coordinator for#media2012, the first citizen media network for the Olympic Games) and others that I stumbled across the torchbearer data. So networks and community are important here (more later).

Indeed, it turned out that the site couldn’t be scraped through a ‘normal’ scraper, and it was the community of the Scraperwiki site – specifically Zarino Zappia – who helped solve the problem and get a scraper working. Without both of those sets of relationships – with the citizen media network and with the developer community on Scraperwiki – this might never have got off the ground.

But it was also important to see the potential newsworthiness in that particular part of the site. Human stories were at the heart of the torch relay – not numbers. Local pride and curiosity was here – a key ingredient of any local newspaper. There were the promises made by its organisers – had they been kept?

The hunch proved correct – this dataset would just keep on giving stories.

The scraper grabbed details on around 6,000 torchbearers. I was curious why more weren’t listed – yes, there were supposed to be around 800 invitations to high profile torchbearers including celebrities, who might reasonably be expected to be omitted at least until they carried the torch – but that still left over 1,000.

I’ve written a bit more about the scraping and data analysis process for The Guardian and the Telegraph data blog. In a nutshell, here are some of the processes used:

  • Overview (pivot table): where do most come from? What’s the age distribution?
  • Focus on details in the overview: what’s the most surprising hometown in the top 5 or 10? Who’s oldest and youngest? What about the biggest source outside the UK?
  • Start asking questions of the data based on what we know it should look like – and hunches
  • Don’t get distracted – pick a focus and build around it.

This last point is notable. As I looked for mentions of Olympic sponsors in nomination stories, I started to build up subsets of the data: a dozen people who mentioned BP, two who mentioned ArcelorMittal (the CEO and his son), and so on. Each was interesting in its own way – but where should you invest your efforts?

One story had already caught my eye: it was written in the first person and talked about having been “engaged in the business of sport”. It was hardly inspirational. As it mentioned adidas, I focused on the adidas subset, and found that the same story was used by a further six people – a third of all of those who mentioned the company.

Clearly, all seven people hadn’t written the same story individually, so something was odd here. And that made this more than a ‘rotten apple’ story, but something potentially systemic.

Signals

While the data was interesting in itself, it was important to treat it as a set of signals to potentially more interesting exploration. Seven torchbearers having the same story was one of those signals. Mentions of corporate sponsors was another.

But there were many others too.

That initial scouring of the data had identified a number of people carrying the torch who held executive positions at sponsors and their commercial partners. The Guardian, The Independent and The Daily Mail were among the first to report on the story.

I wondered if the details of any of those corporate torchbearers might have been taken off off the site afterwards. And indeed they had: seven disappeared entirely (many still had a profile if you typed in the URL directly - but could not be found through search or browsing), and a further two had had their stories removed.

Now, every time I scraped details from the site I looked for those who had disappeared since the last scrape, and those that had been added late.

One, for example – who shared a name with a very senior figure at one of the sponsors – appeared just once before disappearing four days later. I wouldn’t have spotted them if they – or someone else – hadn’t been so keen on removing their name.

Another time, I noticed that a new torchbearer had been added to the list with the same story as the 7 adidas torchbearers. He turned out to be the Group Chief Executive of the country’s largest catalogue retailer, providing “continuing evidence that adidas ignored LOCOG guidance not to nominate executives.”

Meanwhile, the number of torchbearers running without any nomination story went from just 2.7% in the first scrape of 6,056 torchbearers, to 7.2% of 6,891 torchbearers in the last week, and 8.1% of all torchbearers – including those who had appeared and then disappeared – who had appeared between the two dates.

Many were celebrities or sportspeople where perhaps someone had taken the decision that they ‘needed no introduction’. But many also turned out to be corporate torchbearers.

By early July the numbers of these ‘mystery torchbearers’ had reached 500 and, having only identified a fifth, we published them through The Guardian datablog.

There were other signals, too, where knowing the way the torch relay operated helped.

For example, logistics meant that overseas torchbearers often carried the torch in the same location. This led to a cluster of Chinese torchbearers in Stansted, Hungarians in Dorset, Germans in Brighton, Americans in Oxford and Russians in North Wales.

As many corporate torchbearers were also based overseas, this helped narrow the search, with Germany’s corporate torchbearers in particular leading to an article in Der Tagesspiegel.

I also had the idea to total up how many torchbearers appeared each day, to identify days when details on unusually high numbers of torchbearers were missing – thanks to Adrian Short – but it became apparent that variation due to other factors such as weekends and the Jubilee made this worthless.

However, the percentage per day missing stories did help (visualised below by Caroline Beavon), as this also helped identify days when large numbers of overseas torchbearers were carrying the torch. I cross-referenced this with the ‘mystery torchbearer’ spreadsheet to see how many had already been checked, and which days still needed attention.

Daily totals - bar chart

But the data was just the beginning. In the second part of this case study, I’ll talk about the verification process.

February 20 2012

09:31

“All that is required is an issue about which others are passionate and feel unheard”

Here’s a must-read for anyone interested in sports journalism that goes beyond the weekend’s player ratings. As one of the biggest names in European football goes into administration, The Guardian carries a piece by the author of Rangerstaxcase.com, a blogger who “pulled down the facade at Rangers”, including a scathing commentary on the Scottish press’s complicity in the club’s downfall:

“The Triangle of Trade to which I have referred is essentially an arrangement where Rangers FC and their owner provide each journalist who is “inside the tent” with a sufficient supply of transfer “exclusives” and player trivia to ensure that the hack does not have to work hard. Any Scottish journalist wishing to have a long career learns quickly not to bite the hands that feed. The rule that “demographics dictate editorial” applied regardless of original footballing sympathies.

“[...] Super-casino developments worth £700m complete with hover-pitches were still being touted to Rangers fans even after the first news of the tax case broke. Along with “Ronaldo To Sign For Rangers” nonsense, it is little wonder that the majority of the club’s fans were in a state of stupefaction in recent years. They were misled by those who ran their club. They were deceived by a media pack that had to know that the stories it peddled were false.”

Over at Rangerstaxcase.com, the site expands on this in its criticism of STV for uncritical reporting:

“There does not appear to be a point where the media learns its lessons. There is no capacity for improvement. No voice that says: we have been misled by people from this organisation so often in the past that we need to get corroboration before we publish anything more. Alastair Johnston, you will recall, artfully created the impression for Rangers’ supporters and shareholders  that the payment of the tax bills that are now crushing their club would be the responsibility of the parent company. His words then were carefully chosen to avoid actually lying, but his intended audience seemed in little doubt at the time as to what they thought he meant.  Either Mr. Johnston has been misrepresented by STV or he appears to be trying to gain an advantage in the battle to oust Whyte by misleading Rangers’ supporters.”

The piece also includes some interesting reflections on collaborative journalism and crowdsourcing:

“Rangerstaxcase.com has become a platform for some of the sharpest minds and most accomplished professionals to share information, debate, and form opinions based upon a rational interpretation of the facts rather than PR-firm fabrications. In all of the years when the mainstream media had a monopoly on opinion forming and agenda setting, the more sentient football fan had no outlet for his or her opinions. Blogs and other modern media, like Twitter, have democratised information distribution.

“Rangerstaxcase.com has gone far beyond its half-baked “I know a secret” origins to become a forum for citizen journalism. The power of the crowd‑sourced investigation initiated by anyone who is able to ignite the interest of others is a force that has the potential to move mountains in our society. All that is required is an issue about which others are passionate and feel unheard.”

Rangerstaxcase.com is not unique. Combine the passion of sports supporters with the lack of critical faculty in much sports journalism and you have potentially fertile ground.

For my own club, Bolton Wanderers, for example, I turn to Manny Road (site currently laid low by a malware attack).

For the Olympics there will be a regular and easy supply of good news stories to wade through, but also an extremely active network of local and international blogs from people scrutinising the foggier side of the Olympic spirit, which is why I set up Help Me Investigate the Olympics and am encouraging my students to connect with those communities.

February 03 2012

08:15

Video: Heather Brooke’s tips on investigating, and using the FOI and Data Protection Acts

The following 3 videos first appeared on the Help Me Investigate blog, Help Me Investigate: Health and Help Me Investigate: Welfare. I thought I’d collect them together here too. As always, these are published under a Creative Commons licence, so you are welcome to re-use, edit and combine with other video, with attribution (and a link!).

First, Heather Brooke’s tips for starting to investigate public bodies:

Her advice on investigating health, welfare and crime:

And on using the Data Protection Act:

April 13 2011

12:33

Which blog platform should I use? A blog audit

When people start out blogging they often ask what blogging platform they should use – WordPress or Blogger? Tumblr or Posterous? It’s impossible to give an answer, because the first questions should be: who is going to use it, how, and what and who for?

To illustrate how the answers to those questions can help in choosing the best platform, I decided to go through the 35 or so blogs I have created, and why I chose the platforms that they use. As more and more publishing platforms have launched, and new features added, some blogs have changed platforms, while new ones have made different choices to older ones.

Bookmark blogs (Klogging) – Blogger and WordPress to Delicious and Tumblr

When I first began blogging it was essentially what’s called ‘klogging’ (knowledge blogging) – a way to keep a record of useful information. I started doing this with three blogs on Blogger, each of which was for a different class I taught: O-Journalism recorded reports in the field for online journalism students, Interactive Promotion and PR was created to inform students on a module of the same name (later exported to WordPress) and students on the Web and New Media module could follow useful material on that blog.

The blogs developed with the teaching, from being a place where I published supporting material, to a group blog where students themselves could publish their work in progress.

As a result, Web and New Media was moved to WordPress where it became a group blog maintained by students (now taught by someone else). The blog I created for the MA in Television and Interactive Content was first written by myself, then quickly handed over to that year’s students to maintain. When I started requiring students to publish their own blogs the original blogs were retired.

One-click klogging

By this time my ‘klogging’ had moved to Delicious. Webpages mentioned in a specific class were given a class-specific tag such as MMJ02 or CityOJ09. And students who wanted to dig further into a particular subject could use subject-specific tags such as ‘onlinevideo‘ or ‘datajournalism‘.

For the MA in Television and Interactive Content, then, I simply invented a new tag – ‘TVI’ – and set up a blog using Tumblr to pull anything I bookmarked on Delicious with that tag. (This was done in five minutes by clicking on ‘Customise‘ on the main Tumblr page, then clicking on Services and scrolling down to ‘Automatically import my…‘ and selecting RSS feed as Links. Then in the Feed URL box paste the RSS feed at the bottom of delicious.com/paulb/tvi).

(You can do something similar with WordPress – which I did here for all my bookmarks – but it requires more technical knowhow).

For klogging quotes for research purposes I also use Tumblr for Paul’s Literature Review. I’ve not used this as regularly or effectively as I could or should, but if I was embarking on a particularly large piece of research it would be particularly useful in keeping track of key passages in what I’m reading. Used in conjunction with a Kindle, it could be particularly powerful.

Back to the TVI bookmarks: another five minutes on Feedburner allowed me to set up a daily email newsletter of those bookmarks that students could subscribe to as well, and a further five minutes on Twitterfeed sent those bookmarks to a dedicated Twitter feed too (I could also have simply used Tumblr’s option to publish to a Twitter feed). ‘Blogging’ had moved beyond the blog.

Resource blogs – Tumblr and Posterous

For my Online Journalism module at City University London I use Tumblr to publish a curated, multimedia blog in addition to the Delicious bookmarks: Online Journalism Classes collects a limited number of videos, infographics, quotes and other resources for students. Tumblr was used because I knew most content would be instructional videos and I wanted a separate place to collect these.

The more general Paul Bradshaw’s Tumblelog (http://paulbradshaw.tumblr.com/) is where I maintain a collection of images, video, quotes and infographics that I look to whenever I need to liven up a presentation.

For resources based on notes or documents, however, Posterous is a better choice.

Python Notes and Notes on Spreadsheet Formulae and CAR, for example, both use Posterous as a simple way for me to blog my own notes on both (Python is a programming language) via a quick email (often drafted while on the move without internet access).

Posterous was chosen because it is very easy to publish and tag content, and I wanted to be able to access my notes based on tag (e.g. VLOOKUP) when I needed to remember how I’d used a particular formula or function.

Similarly, Edgbaston Election Campaign Exprenses and Hall Green Election Campaign Exprenses use Posterous as a quick way to publish and tag PDFs of election expense receipts from both constituencies (how this was done is explained here), allowing others to find expense details based on candidate, constituency, party or other details, and providing a space to post comments on findings or things to follow up.

Niche blogs – WordPress and Posterous

Although Online Journalism Blog began as ‘klogging’ it soon became something more, adding analysis, research, and contributions from other authors, and the number of users increased considerably. Blogger is not the most professional-looking of platforms, however (unless you’re prepared to do a lot of customisation), so I moved it to WordPress.com. And when I needed to install plugins for extra functionality I moved it again to a self-hosted WordPress site.

Finally, when the site was the victim of repeated hacking attempts I moved it to a WordPress MU (multi user) site hosted by Philip John’s Journal Local service, which provided technical support and a specialised suite of plugins.

If you want a powerful and professional-looking blogging platform it’s hard to beat WordPress.com, and if you want real control over how it works – such as installing plugins or customising themes – then a self-hosted WordPress site is, for me, your best option. I’d also recommend Journal Local if you want that combination of functionality and support.

If, however, you want to launch a niche blog quickly and functionality is not an issue then Posterous is an even better option, especially if there will be multiple contributors without technical skills. Council Coverage in Newspapers, for example, used Posterous to allow a group of people to publish the results of an investigation on my crowdsourced investigative journalism platform Help Me InvestigateThe Hospital Parking Charges Blog did the same for another investigation, but as it was only me publishing, I used WordPress.

Group blogs – Posterous and Tumblr

Posterous suits groups particularly well because members only need to send their post to a specific email address that you give them (such as post@yourblog.posterous.com) to be published on the blog.

It also handles multimedia and documents particularly well – when I was helping Podnosh‘s Nick Booth train a group of people with Flip cameras we used Posterous as an easy way for members of a group to instantly publish the video interviews they were doing by simply sending it to the relevant email address (Posterous will also cross-publish to YouTube and Twitter, simplifying those processes).

A few months ago Posterous launched a special ‘Groups’ service that publishes content in a slightly different way to make it easier for members to collaborate. I used this for another Help Me Investigate investigation - Recording Council Meetings – where each part of the investigation is a post/thread that users can contribute to.

Again, Posterous provides an easy way to do this – all people need to know is the email address to send their contribution to, or the web address where they can add comments to other posts.

If your contributors are more blog-literate and want to retain more control over their content, another option for group blogs is Tumblr. Brumblr, for example, is one group blog I belong to for Birmingham bloggers, set up by Jon Bounds. ‘We Love Michael Grimes‘ is another, set up by Pete Ashton, that uses Tumblr for people to post images of Birmingham’s nicest blogger.

Blogs for events – Tumblr, Posterous, CoverItLive

When I organised a Citizen Journalism conference in 2007, I used a WordPress blog to build up to it, write about related stories, and then link to reports on the event itself. Likewise, when later that year the NUJ asked me to manage a team of student members as they blogged that year’s ADM, I used WordPress for a group blog.

As the attendees of further events began to produce their own coverage, the platforms I chose evolved. For JEEcamp.com (no longer online), I used a self-hosted WordPress blog with an aggregation plugin that pulled in anything tagged ‘JEEcamp’ on blogs or Twitter. CoverItLive was also used to liveblog – and was then adopted successfully by attendees when they returned to their own news operations around the country (and also, interestingly, by Downing Street after they saw the tool being used for the event).

For the final JEEcamp I used Tumblr as an aggregator, importing the RSS feed from blog search engine Icerocket for any mention of ‘JEEcamp’.

In future I may experiment with the Posterous iPhone app’s new Events feature, which aggregates posts in the same location as you.

Aggregators – Tumblr

Sometimes you just want a blog to keep a record of instances of a particular trend or theme. For example, I got so sick of people asking “Is blogging journalism?” that I set up Is Ice Cream Strawberry?, a Tumblr blog that aggregates any articles that mention the phrases “Is blogging journalism”, “Are bloggers journalists” and “Is Twitter journalism” on Google News.

This was set up in the same way as detailed above, with the Feed URL box completed using the RSS feed from the relevant search on Google News or Google Blog Search (repeat for each feed).

Likewise, Online Journalism Jobs aggregates – you’ve got it – jobs in online journalism or that use online journalism skills. It pulls from the RSS feed for anything I bookmark on Delicious with the tag ‘ojjobs’ – but it can also be done manually with the Tumblr bookmark or email address, which is useful when you want to archive an entire job description that is longer than Delicious’s character limit.

Easy hyperlocal blogging – WordPress, Posterous and Tumblr

For a devoted individual hyperlocal blog WordPress seems the best option due to its power, flexibility and professionalism. For a hyperlocal blog where you’re inviting contributions from community members via email, Posterous may be better.

But if you want to publish a hyperlocal blog and have never had the time to do it justice, Tumblr provides a good way to make a start without committing yourself to regular, wordy updates. Boldmere High Street is my own token gesture – essentially a photoblog that I update from my mobile phone when I see something of interest – and take a photo – as I walk down the high street.

Personal blogs

As personal blogs tend to contain off-the-cuff observations, copies of correspondence or media, Posterous suits it well. Paul Bradshaw O/T (Off Topic) is mine: a place to publish things that don’t fit on any of the other blogs I publish. I use Posterous as it tends to be email-based, sometimes just keeping web-based copies of emails I’ve sent elsewhere.

It’s difficult to prescribe a platform for personal blogs as they are so… personal. If you talk best about your life through snatches of images and quotes, Tumblr will work well. I have a family Tumblr, for example, that pulls images and video from a family Flickr account, tweets from a family Twitter feed, video from a family YouTube account, and also allows me to publish snatches of audio or quotes.

You could use this to, for instance, create an approved-members-only Facebook page for the family so other family members can ‘follow’ their grandchildren, and publish updates from the Tumblr blog via RSS Graffiti. Facebook is, ultimately, the most popular personal blogging platform.

If it is hard to separate your personal life from your professional life, or your personal hobby involves playing with technology, WordPress may be a better choice.

And Blogger may be an easy way to bring together material from Google properties such as Picasa and Orkut.

Company blogs

Likewise, although Help Me Investigate’s blog started as two separate blogs on WordPress (one for company updates, the other for investigation tips), it now uses Posterous for both as it’s an easier way for multiple people to contribute.

This is because ease of publishing is more important than power – but for many companies WordPress is going to be the most professional and flexible option.

For some, Tumblr will best communicate their highly visual and creative nature. And for others, Posterous may provide a good place to easily publish documents and video.

Blogs – flexible enough for anything

What emerges from all the above is that blogs are just a publishing platform. There was a time when you had to customise WordPress, Typepad or Blogger to do what you wanted – from linkblogging and photoblogging to group blogs and aggregation. But those problems have since been solved by an increasing range of bespoke platforms.

Social bookmarking platforms and Twitter made it easier to linkblog; Tumblr made it easier to photoblog or aggregate RSS feeds. Posterous lowered the barrier to make group blogging as easy as sending an email. CoverItLive piggybacked on Twitter to aggregate live event coverage. And Facebook made bloggers of everyone without them realising.

A blog can now syndicate itself across multiple networks: Tumblr and Posterous make it easy to automatically cross-publish links and media to Twitter, YouTube and any other media-specific platform. RSS feeds can be pulled from Flickr, Delicious, YouTube or any of dozens of other services into a Facebook page or a WordPress widget.

What is important is not to be distracted by the technology, but focus on the people who will have to use it, and what they want to use it for.

To give a concrete example: I was once advising an organisation who wanted to publish their work online and help young people get their work out there. The young people used mobile phones (Blackberrys) and were on Facebook, but the organisation also wanted the content created by those young people to be seen by potential funders, in a professional context.

I advised them to:

  • Set up a moderated Posterous so that it would cross-publish to individuals’ Facebook pages (so there would be instant feedback for those users rather than it be published in an isolated space online that their friends had to go off and find);
  • Give the Posterous blog email address to the young people so they could use it to send in their work (making it easy to use on a device they were comfortable with);
  • Then to set up a separate ‘official’ WordPress site that pulled in the Posterous feed into a side-widget alongside the more professional, centrally placed, content (meeting the objectives of the organisation).

This sounds more technically complex than it is in practice, and the key thing is that it makes publishing as easy as possible: for the young users of the service, they only had to send images and comments to an email address. For members of the organisation they only had to write blog posts. Everything else, once set up, was automated. And free.

Many people hesitate before blogging, thinking that their effort has to be right first time. It doesn’t. Going through these blogs I counted around 35 that I’ve either created or been involved in. Many of those were retired when they ceased to be useful; some were transferred to new platforms. Some changed their names, some were deleted. Increasingly, they are intended from the start to have a limited shelf life. But every one has taught me something.

And those are just my experiences – how have you used blogs in different ways? And how has it changed?

PrintFriendly

March 31 2011

19:14

Quicker, smaller, more transparent: What Knight should do next? #JCARN

This month’s Carnival of Journalism is about “driving innovation” – in the wake of the end of the Knight Foundation’s News Challenge five year run, among other things. Here’s my take:

Driving innovation needs to be quick

Any innovative idea needs to be able to deploy and iterate quickly – and any scheme to fund innovation needs to support that.

Having been through the Knight News Challenge three times, and reached the final shortlist twice, I was struck each time by how much changed in the online world between the initial submission and final award: If an internet year is worth 4.7 normal years, this process was taking over 3 ‘years’ in internet time. So much changed during that period that by the time I had reached the second or third stage, I wanted to re-write the whole thing.

In contrast, when I entered Channel 4′s 4iP fund (far from perfect, but certainly faster), the time from application to approval was swift. This allowed us to spend a few months working with the funders in addressing the issues the project raised (in Help Me Investigate’s case, largely legal ones) and still being able to start work before the Knight awards had even been shortlisted.

Why the difference? Perhaps because of the next point.

Innovation thrives on limitations

One of the reasons the internet has been so disruptive is that it has lowered the barriers to entry. Multinational media organisations have thrown millions at their own solutions, and yet most of them fail. One of the problems that funds such as Knight’s and Channel 4′s aim to solve is of access to funds – but those funds don’t have to be large.

The median value of a News Challenge award has ranged from $200,000 to $326,000 during its four years of existence, and I suspect one of the problems with Channel 4′s 4iP fund was that its £50m pot was based on television-scale budgets.

You don’t need a large amount of money to innovate online, and the best research and development comes after launch, because you can see how users are using it, and what they tell you they want it to do, or indeed what they build themselves for you.

So instead of funding to the hilt a dozen or so ideas that have to jump through several on-paper hoops to prove their theoretical viability, I would suggest this: spread small amounts of innovation funding wider across 100 pilot projects, and see how they jump through real-life hoops instead.

Projects that jump through those hoops could perhaps then apply to a second fund specifically aimed at the separate problem of scalability. I can speak from experience that running a pilot project gives you a much stronger sense of what you’ll need to do to scale up, than doing the same exercise on paper.

This second fund could even provide rapid access to servers or customer support staff or legal advice while the application is being considered (otherwise the customer experience becomes so bad that by the time funds are released, the project has no users left).

Separating funding innovation from funding scaling allows you to first fund projects that take bigger risks, and generate a bigger pool of innovators with experience of launching and managing an innovative product. And that leads on to the third point:

Support innovation, not projects

Every fund that I’ve been involved in neglected what could have been potentially their biggest value: the process itself of vetting applications and monitoring progress.

(It’s also the biggest source of resentment: there will always be accusations that funds are given to the ‘in-crowd’)

“Driving innovation” should go beyond “funding innovative projects”. Simply opening up the application process so that everyone can see how ideas develop – and what the ‘experts’ think about the detail of proposals – can help contribute to a culture of innovation. Seeing other great ideas being developed makes people feel a whole lot more innovative – and produce better ideas – than getting an opaque email saying “Proposal not accepted” and seeing a disappointing-on-the-surface winners’ list 5 months later.

For the funders this represents a lot of admin, but tough: that’s their job. And there are creative possibilities here: when you move the focus from funding innovative projects to supporting innovation you can start to broaden the focus towards building a network of innovators and aspiring innovators, towards creating a supportive ecology. That also spreads the costs, lowers risk, and increases benefits.

Ultimately, just as networked models are allowing us to revisit ways of doing things without physical limitations, the funding process should reflect that change too. It should be quicker, smaller scale, and more transparent.

February 23 2011

20:10

Help Me Investigate is now open source

I have now released the source code behind Help Me Investigate, meaning others can adapt it, install it, and add to it if they wish to create their own crowdsourcing platform or support the idea behind it.

This follows the announcement 2 weeks ago on the Help Me Investigate blog (more coverage on Journalism.co.uk and Editors Weblog),

The code is available on GitHub, here.

Collaborators wanted

I’m looking for collaborators and coders to update the code to Rails 3, write documentation to help users install it, improve the code/test, or even be the project manager for this project.

Over the past 18 months the site has surpassed my expectations. It’s engaged hundreds of people in investigations, furthered understanding and awareness of crowdsourcing, and been runner-up for Multimedia Publisher of the Year. In the process it attracted attention from around the world – people wanting to investigate everything from drug running in Mexico to corruption in South Africa.

Having the code on one site meant we couldn’t help those people: making it open source opens up the possibility, but it needs other people to help make that a reality.

If you know anyone who might be able to help, please shoot them a link. Or email me at paul(at)helpmeinvestigate.com

Many thanks to Chris Taggart and Josh Hart for their help with moving the code across.

11:04

Councils should allow public meetings to be recorded, says Pickles

A welcome window of clarity on the issue of whether bloggers can record public council meetings today: Local Government Secretary Eric Pickles has weighed in to say that public meetings should be open to bloggers and that they should “routinely allow online filming of public discussions as part of increasing their transparency”

It’s an issue that I’ve been investigating for a while on Help Me Investigate: while some councils actively stream their own meetings, and others allow members of the public to do the same, some councils explicitly forbid recording, others allow audio but require mayoral permission for video, and a few have conducted ‘investigations’ of citizens for daring to record public proceedings (and councillors), or ejected them from the room (see video above).

Pickles’ guidance – and the accompanying letter sent to all councils – provides useful material to show uncooperative councils.

The letter calls on councils to give “credible community or ‘hyper-local’ bloggers and online broadcasters the same routine access to council meetings as the traditional accredited media have”

It also reassures councils that “giving greater access will not contradict data protection law requirements”. This is a key part, as data protection is often used as an excuse to prevent filming. The Help Me Investigate investigation revealed a worrying ignorance regarding data protection laws by councils even in formal internal reports. Other areas, including privacy, copyright, defamation and “procedural matters” are covered in this blog post rounding up some of the investigation’s findings.

Other material that bloggers may find useful are mentioned in Pickles’ announcement. They include The Public Bodies (Admission to Meetings) Act 1960The Local Government Act of 1972 and The Local Government (Access to Information) Act 1985.

I’m working on producing a cribsheet for bloggers wanting to record their local council’s public meetings. If you want to help, please leave a comment or subscribe to the investigation blog.

October 22 2010

11:00

Help Me Investigate – anatomy of an investigation

Earlier this year I and Andy Brightwell conducted some research into one of the successful investigations on my crowdsourcing platform Help Me Investigate. I wanted to know what had made the investigation successful – and how (or if) we might replicate those conditions for other investigations.

I presented the findings (presentation embedded above) at the Journalism’s Next Top Model conference in June. This post sums up those findings.

The investigation in question was ‘What do you know about The London Weekly?‘ – an investigation into a free newspaper that was (they claimed – part of the investigation was to establish if this was a hoax) about to launch in London.

The people behind the paper had made a number of claims about planned circulation, staffing and investment that most of the media reported uncritically. Martin Stabe, James Ball and Judith Townend, however, wanted to dig deeper. So, after an exchange on Twitter, Judith logged onto Help Me Investigate and started an investigation.

A month later members of the investigation had unearthed a wealth of detail about the people behind The London Weekly and the facts behind their claims. Some of the information was reported in MediaWeek and The Media Guardian podcast Media Talk; some formed the basis for posts on James Ball’s blog, Journalism.co.uk and the Online Journalism Blog. Some has, for legal reasons, remained unpublished.

A note on methodology

Andrew conducted a number of semi-structured interviews with contributors to the investigation. The sample was randomly selected but representative of the mix of contributors, who were categorised as either ‘alpha’ contributors (over 6 contributions), ‘active’ (2-6 contributions) and ‘lurkers’ (whose only contribution was to join the investigation). These interviews formed the qualitative basis for the research.

Complementing this data was quantitative information about users of the site as a whole. This was taken from two user surveys – one when the site was 3 months’ old and another at 12 months – and analysis of analytics taken from the investigation (such as numbers and types of actions, frequency, etc.)

What are the characteristics of a crowdsourced investigation?

One of the first things I wanted to analyse was whether the investigation data matched up to patterns observed elsewhere in crowdsourcing and online activity. An analysis of the number of actions by each user, for example, showed a clear ‘power law’ distribution, where a minority of users accounted for the majority of activity.

This power law, however, did not translate into a breakdown approaching the 90-9-1 ‘law of participation inequality‘ observed by Jakob Nielsen. Instead, the balance between those who made a couple of contributions (normally the 9% of the 90-9-1 split) and those who made none (the 90%) was roughly equal. This may have been because the design of the site meant it was not possible to ‘lurk’ without being a member of the site already, or being invited and signing up.

Adding in data on those looking at the investigation page who were not members may have shed further light on this.

What made the crowdsourcing successful?

Clearly, it is worth making a distinction between what made the investigation successful as a series of outcomes, and what made crowdsourcing successful as a method.

What made the community gather, and continue to return? One hypothesis was that the nature of the investigation provided a natural cue to interested parties – The London Weekly was published on Fridays and Saturdays and there was a build up of expectation to see if a new issue would indeed appear.

I was curious to see if the investigation had any ‘rhythm’. Would there be peaks of interest correlating to the expected publication?

The data threw up something else entirely. There was indeed a rhythm but it was Wednesdays that were the most popular day for people contributing to the investigation.

Why? Well, it turned out that one of the investigation’s ‘alpha’ contributors – James Ball – set himself a task to blog about the investigation every week. His blog posts appeared on a Wednesday.

That this turned out to be a significant factor in driving activity tells us one important lesson: talking publicly and regularly about the investigation’s progress is key.

This data was backed up from the interviews. One respondent mentioned the “weekly cue” explicitly.

More broadly, it seems that the site helped keep track of a number of discussions taking place around the web. Having been born from a discussion on Twitter, further conversations on Twitter resulted in further people signing up, along with comments threads and other online discussion. This fit the way the site was designed culturally – to be part of a network rather than asking people to do everything on-site.

But the planned technical connectivity of the site with the rest of the web (being able to pull related tweets or bookmarks, for example) had been dropped during development as we focused on core functionality. This was not a bad thing, I should emphasise, as it prevented us becoming distracted with ‘bells and whistles’ and allowed us to iterate in reaction to user activity rather than our own assumptions of what users would want. This research shows that user activity and informs future development accordingly.

The presence of ‘alpha’ users like James and Judith was crucial in driving activity on the site – a pattern observed in other successful investigations. They picked up the threads contributed by others and not only wove them together into a coherent narrative that allowed others to enter more easily, but also set the new challenges that provided ways for people to contribute. The fact that they brought with them a strong social network presence is probably also a factor – but one that needs further research.

The site has always been designed to emphasise the role of the user in driving investigations. The agenda is not owned by a central publisher, but by the person posing the question – and therefore the responsibility is theirs as well. In this sense it draws on Jenkins’ argument that “Consumers will be more powerful within convergence culture – but only if they recognise and use that power.” This cultural hurdle may be the biggest one that the site has to address.

Indeed, the site is also designed to offer “Failure for free”, allowing users to learn what works and what doesn’t, and begin to take on that responsibility where required.

The investigation also suited crowdsourcing well, as it could be broken down into separate parts and paths – most of which could be completed online: “Where does this claim come from?” “Can you find out about this person?” “What can you discover about this company?”. One person, for example, used Google Streetview to establish that the registered address of the company was a postbox.

Other investigations that are less easily broken down may be less suitable for crowdsourcing – or require more effort to ensure success.

A regular supply of updates provided the investigation with momentum. The accumulation of discoveries provided valuable feedback to users, who then returned for more. In his book on Wikipedia, Andrew Lih (2009 p82) notes a similar pattern – ‘stigmergy‘ – that is observed in the natural world: “The situation in which the product of previous work, rather than direct communication [induces and directs] additional labour”. An investigation without these ‘small pieces, loosely joined’ might not suit crowdsourcing so well.

One problem, however, was that those paths led to a range of potential avenues of enquiry. In the end, although the core questions were answered (was the publication a hoax and what were the bases for their claims) the investigation raised many more questions.

These remained largely unanswered once the majority of users felt that their questions had been answered. Like any investigation, there came a point at which those involved had to make a judgement whether they wished to invest any more time in it.

Finally, the investigation benefited from a diverse group of contributors who contributed specialist knowledge or access. Some physically visited stations where the newspaper was claiming distribution to see how many copies were being handed out. Others used advanced search techniques to track down details on the people involved and the claims being made, or to make contact with people who had had previous experiences with those behind the newspaper.

The visibility of the investigation online led to more than one ‘whistleblower’ approach providing inside information.

What can be done to make it better?

Looking at the reasons that users of the site as a whole gave for not contributing to an investigation, the majority attributed this to ‘not having enough time’. Although at least one interviewee, in contrast, highlighted the simplicity and ease of contributing, it needs to be as easy and simple as possible for users to contribute in order to lower the perception of effort and time needed.

Notably, the second biggest reason for not contributing was a ‘lack of personal connection with an investigation’, demonstrating the importance of the individual and social dimension of crowdsourcing. Likewise, a ‘personal interest in the issue’ was the single largest factor in someone contributing. A ‘Why should I contribute?’ feature on each investigation may be worth considering.

Others mentioned the social dimension of crowdsourcing – the “sense of being involved in something together” – what Jenkins (2006) would refer to as “consumption as a networked practice”.

This motivation is also identified by Yochai Benkler in his work on networks. Looking at non-financial reasons why people contribute their time to online projects, he refers to “socio-psychological reward”. He also identifies the importance of “hedonic personal gratification”. In other words, fun. (Interestingly, these match two of the three traditional reasons for consuming news: because it is socially valuable, and because it is entertaining. The third – because it is financially valuable – neatly matches the third reason for working).

While it is easy to talk about “Failure for free”, more could be done to identify and support failing investigations. We are currently developing a monthly update feature that would remind users of recent activity and – more importantly – the lack of activity. The investigators in a group might be asked whether they wish to terminate the investigation in those cases, emphasising their role in its progress and helping ‘clean up’ the investigations listed on the first page of the site.

That said, there is also a danger is interfering too much in reducing failure. This is a natural instinct, and I have to continually remind myself that I started the project with an expectation of 95-99% of investigations ‘failing’ through a lack of motivation on the part of the instigator. That was part of the design. It was the 1-5% of questions that gained traction that would be the focus of the site (this is how Meetup works, for example – most groups ‘fail’ but there is no way to predict which ones. As it happens, the ‘success’ rate of investigations has been much higher than expected). One analogy is a news conference where members throw out ideas – only a few are chosen for investment of time and energy, the rest ‘fail’.

In the end, it is the management of that tension between interfering to ensure everything succeeds – and so removing the incentive for users to be self-motivated – and not interfering at all – leaving users feeling unsupported and unmotivated – that is likely to be the key to a successful crowdsourcing project. More than a year into the project, this is still a skill that I am learning.

07:44

Manchester Police tweets and the MEN – local data journalism part 2

Manchester Evening News visualisation of Police incident tweets

A week ago I blogged about how the Manchester Evening News were using data visualisation to provide a deeper analysis of the local police force’s experiment in tweeting incidents for 24 hours. In that post Head of Online Content Paul Gallagher said he thought the real benefit would “come afterwards when we can also plot the data over time”.

Now that data has been plotted, and you can see the results here.

In addition, you can filter the results by area, type (crime or ‘social work’) and category (specific sort of crime or social issue).

It’s a good follow up, although at the current time somewhat short of illuminating findings. The page introducing the interactive chart links to just one time-based story from the data: that between 9pm and 10pm at night a quarter of all calls relate to anti-social behaviour. There’s no indication that journalists will be digging for others.

The text also fails to invite users to contribute their own insights, instead presenting the tools as a way to find a personalised ‘story’ rather than the start of any collaborative process.

The visualisation tool could also be improved. While allowing you to look at any particular category and area in isolation, it doesn’t allow you to visually compare them to see, for example, whether Bolton or Bury is quieter at night, or whether burglary peaks in the morning in one area, but in the evening in another.

And of course, they’ve not linked to the original data to allow a helpful developer to do that for them (going into greater depth: the URL for each set of results is ‘hackable’ – i.e. easy to construct if you know what you’re looking for – and so easier to scrape the resulting tweets. However, the chart itself with the numbers in it is Flash-based which creates a problem)

On the positive side, it’s good to see a clear basic visualisation with a base starting at 0.

If you do want the raw data, it’s been put together by The Guardian’s Michael Brunton-Spall and is available here.

This formed the basis for a day of activity at a Hacks & Hackers Day last week, which the Manchester Evening News took part in. The results of that can be read on the Scraperwiki blog and on Andy Dickinson’s blog. These included:

“David Kendal produced his own project mapping 999 calls in the area. He took the tweet data and put it through the Yahoo placemaker tool, plotting information on a Google map, to see which areas got calls over certain periods of time.

“Yuwei Lin and Enrico Zini [produced] a GMP tweet database, and showed a very neat search tool that allowed analysis of certain aspects of the police data (3257 items).”

And unrelated to the police tweets but of enormous use to journalists was the creation of judgmental.org.uk, a website of United Kingdom case judgment data.

“At the moment this is only available via Bailli and the team wanted to make something more usable and searchable (Bailli’s data cannot be scraped or indexed by Google).

“It is still a work in progress, but could eventually provide a very useful tool for journalists. Although the data is not updated past a certain point, journalists would be able to analyse the information for different factors: which judges made which judgments? What is the level of activity in different courts? Which times of year are busier? It could be scrutinised to determine different aspects of the cases.”

I’m immensely pleased to see this come about as a result in part (I’m told) of an investigation on Help Me Investigate last year.

September 29 2010

08:10

Time to talk about legal

As a lone blogger how much legal protection do you have? No more than anyone else, when it comes to libel, contempt of court law and so on, except that people are more likely to pay attention to large media organisations.

But there are many instances where bloggers have lost a lot of time and money over legal disputes. Last week, for example, journalist and blogger Dave Osler finally saw an end to a legal battle that consumed three years of his life, after he was sued for libel by the political activist Johanna Kaschke. Despite being refused the right to appeal the strike-out of the Osler case, she is still planning to appeal another High Court decision that ended her libel claim against Alex Hilton and John Gray.

If all individual bloggers worried about getting into trouble too much, we’d write much less than we do. Even big scary cases aren’t a deterrent: Dave Osler is still blogging. I was personally surprised by the results of my survey of 71 small online publishers this summer. Not that only 27 per cent had been involved in legal disputes (that was about what I expected) but that over half were satisfied with the number of legal resources available.

Personally, the grey areas of law trouble me and I don’t think there could be enough support: I’d like to see more organised structures for legal help, a sort of Citizens Advice Bureau for bloggers, if you like. Informal advice is already spreading via social networks, as lawyers increasingly use Twitter and blogs to join the conversation.

As I reported on my site Meeja Law, one hyperlocal blogger who was accused of breach of copyright asked for legal advice via Twitter: “Two separate media lawyers confirmed (for free) that I’d done nothing wrong. I also contacted [hyperlocal organisation] Talk About Local for advice, and they told me the same.”

Talk About Local has published several media law guides online (eg. this one on defamation) and the organisation’s founder William Perrin offers some frank legal advice ahead of a legal session at last weekend’s London Local Neighbourhoods Online Unconference:

…just about the best legal advice, which very few follow is to set up a 
limited company and keep the website inside that. Then you don’t lose 
your house to a nutter under defamation law….

Another concern of mine is the lack of transparency of courts data, something I’ve discussed at length here. I think bloggers should be able to access more information about cases; at the very least, the Ministry of Justice needs to consider its outmoded contempt of court law that is ill-equipped to deal with the online age.

In the coming months, I’d like to build up the conversation in this area and think about how we might approach some of these issues. If you’d like to be part of this informal online ‘working group’ please consider joining the Help Me Investigate challenge at this link (request membership here), or discussing via the OJB Facebook group.

Judith Townend (@jtownend on Twitter) is a PhD research student at City University London and freelance journalist.

July 20 2010

15:06

Election campaign expenses – an online investigation

Last week Channel 4 and the Bureau of Investigative Journalism “raised questions” over the election campaign expenses of Conservative MP Zac Goldsmith, specifically the practice of claiming partial expenses on the grounds that ‘not all material was used’.

The response from Goldsmith and the Conservative Party seemed to argue that this was standard practice. “The examples raised could be seen in the returns of other candidates.”

So I decided to obtain the expenses receipts for two of the most closely-fought campaigns in Birmingham, and put them online, with the invitation for others to take a look to see if that is indeed true.

And now the receipts for the election campaigns of Gisela Stuart (Lab) and Deirdre Allen (Con) for Edgbaston can be found at EdgbastonElectionExpenses.posterous.com

Here’s the plan:

Alternatively, you can start your own investigation into your own local election candidates. Here’s how to do that.

Remember, finding nothing is still a finding, as it challenges Goldsmith’s story.

14:34

Online innovator to leave university post after ‘complicated decision’

Online journalism innovator Paul Bradshaw has taken voluntary redundancy from his post as course leader for the online journalism MA at Birmingham City University, in what he says was a “complicated decision”.

Bradshaw, who is also founder of the Online Journalism Blog, hopes he can now invest more time in his own projects, with immediate plans to develop his Help Me Investigate site.

“It was a very complicated decision,” he told Journalism.co.uk. “There are a lot of opportunities around data journalism that I want to explore and I want to spend more time on Help Me Investigate. I felt it was probably the right time to dive in to more of those opportunities and now I have time to accept offers I have been made. But I am wary of taking too much work on. Part of the point is to invest more time in Help Me Investigate. I plan to start some development work and explore business models soon.”

Bradshaw is also already working on two different books, his own on magazine editing which is set to be completed by the end of the year and another dedicated to online journalism, which he is contributing to with former FT.com news editor Liisa Rohumaa, likely to be out by early next year.

On top of all that, he admits he may  keep his toes in the teaching pool.

“I will certainly miss parts of teaching,” he told Journalism.co.uk. “I absolutely, enormously enjoyed teaching the students this year. Some of their work has been the best so far. I may still do a bit of teaching, but I think I have always wanted to keep growing and developing. The students say they are gutted, but they were quite excited and positive about what I am doing. I am experiencing a huge jumble of emotions. I am excited about the possibilities but I am really going to miss the students and staff.”Similar Posts:



Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl