Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

July 11 2011


An experiment in creating an ‘Auto-Debunker’ twitter account

As the conspiracy theories flew around last Friday, one in particular caught fire: the idea that the News Of The World might have been closed down because it would then allow for its assets – i.e. incriminating evidence – to be destroyed.

Perhaps because it was published under the Reuters brand (although the byline abrogated them of any responsibility for its contents) by the end of the day it had accumulated over 4,000 retweets.

I had already personally tweeted a couple of those users to point out that comments on the article had quickly debunked its argument. And by 6.26 that evening David Allen Green had published an explanation of the flaws in a piece at the New Statesman.

But people were still retweeting: how to connect the two?

Creating @autodebunker

It took me all of 20 minutes to hack together a simple automated service that would reply to people retweeting the Reuters blog post.

Here’s how to do it:

1. Create a specific Twitter account

I first of all named it autocorrecter but have now changed it to autodebunker. Make sure there’s an explanation in the biography that it is automated – and attribute authorship so people can make a judgement on its authority. Use the bio link to point at a page explaining more – I’ve linked it to this post.

2. Find an RSS feed for tweets that need debunking

If you use Twitter’s advanced search facility you can search for all tweets mentioning the Reuters blog post in question – even if they’ve been shortened. Just put the URL into the box marked ‘All of these words‘.

You’ll also want to prevent yourself creating a loop where you are replying to your own tweets (because these will contain the URL too). So in the box ‘None of these words‘ put your new Twitter account name – in this case, debunker.

Unfortunately the user’s name is not included in a tweet unless it is retweeted, so it’s best to make sure that you only include tweets with RT in them too. Add RT to the box marked ‘All of these words‘.

As others debunk the story you may have to exclude mentions of them as well, so you’re not tweeting at people who are already debunking. Likewise any other similar indicators.

My search boxes eventually looked like this (the account was originally called autocorrecter, so I had to exclude that as well)

3. Create a new RSS feed using Feedburner

The service we’re going to use will not let us publish a Twitter search RSS feed onto Twitter, and we also need to be able to easily edit this feed in response to changes. Feedburner is a very useful service for doing both.

You’ll be using the RSS feed for the advanced search mentioned above. When you conduct the search you will see a link to that RSS feed on the right – and if you click on it you should be taken to an address like: http://search.twitter.com/search.atom?q=+RT+http%3A%2F%2Fblogs.reuters.com%2Fmediafile%2F2011%2F07%2F07%2Fis-murdoch-free-to-destroy-tabloids-records%2F++-autocorrecter+-autodebunker+-davidallengreen

Copy that address and paste it into the box in Feedburner when you’re asked what feed you want to convert – it’s pretty much at the bottom of the first page you get when you log on to Feedburner. Follow the steps for creating a new RSS feed.

You’ll then be given a new address that begins with http://feeds.feedburner.com/ (something like: http://feeds.feedburner.com/Http/blogsreuterscom/mediafile/2011/07/07/is-murdoch-free-to-destroy-tabloids-records/-TwitterSearch?format=xml)

Copy this address for the next stage.

4. Use Twitterfeed to automatically publish debunking tweets

Twitterfeed is a great service for this. Log on and click on the button to create a new feed.

Give it a name and paste in the Feedburner RSS feed you copied above – but staying in step 1, click on ‘Advanced Settings‘ below.

The key areas here are ‘Post Prefix‘ and ‘Post Suffix’. This is what you want added to what you’re republishing.

Sadly, both are limited to 20 characters, but here’s what I added: in Post Prefix, I typed ‘See comments though’. In Post Suffix I added (probably not true).

You could add a shortened URL in either, too.

In Step 2 connect this to your new Twitter account by clicking the ‘Authorise’ button.

Don’t forget to click the final ‘Finish’ button to activate it all.

After a while it should start publishing tweets like this:

5. Monitor and tweak

Don’t forget to keep checking what people are tweeting, and what the account is tweeting, and adapt the Feedburner feed or Post Prefix and Suffix accordingly.

Any suggestions?

This is just a quick hack – there will be better ways of doing the above, but it’s an illustration of how you can use computer power to communicate with a distributed population of distributors. If you decide to do more with the idea, I’d love to know about it.

And after all this, of course, you have to ask: why have Reuters not updated their blog post to at least acknowledge the criticisms?


April 13 2011


Which blog platform should I use? A blog audit

When people start out blogging they often ask what blogging platform they should use – WordPress or Blogger? Tumblr or Posterous? It’s impossible to give an answer, because the first questions should be: who is going to use it, how, and what and who for?

To illustrate how the answers to those questions can help in choosing the best platform, I decided to go through the 35 or so blogs I have created, and why I chose the platforms that they use. As more and more publishing platforms have launched, and new features added, some blogs have changed platforms, while new ones have made different choices to older ones.

Bookmark blogs (Klogging) – Blogger and WordPress to Delicious and Tumblr

When I first began blogging it was essentially what’s called ‘klogging’ (knowledge blogging) – a way to keep a record of useful information. I started doing this with three blogs on Blogger, each of which was for a different class I taught: O-Journalism recorded reports in the field for online journalism students, Interactive Promotion and PR was created to inform students on a module of the same name (later exported to WordPress) and students on the Web and New Media module could follow useful material on that blog.

The blogs developed with the teaching, from being a place where I published supporting material, to a group blog where students themselves could publish their work in progress.

As a result, Web and New Media was moved to WordPress where it became a group blog maintained by students (now taught by someone else). The blog I created for the MA in Television and Interactive Content was first written by myself, then quickly handed over to that year’s students to maintain. When I started requiring students to publish their own blogs the original blogs were retired.

One-click klogging

By this time my ‘klogging’ had moved to Delicious. Webpages mentioned in a specific class were given a class-specific tag such as MMJ02 or CityOJ09. And students who wanted to dig further into a particular subject could use subject-specific tags such as ‘onlinevideo‘ or ‘datajournalism‘.

For the MA in Television and Interactive Content, then, I simply invented a new tag – ‘TVI’ – and set up a blog using Tumblr to pull anything I bookmarked on Delicious with that tag. (This was done in five minutes by clicking on ‘Customise‘ on the main Tumblr page, then clicking on Services and scrolling down to ‘Automatically import my…‘ and selecting RSS feed as Links. Then in the Feed URL box paste the RSS feed at the bottom of delicious.com/paulb/tvi).

(You can do something similar with WordPress – which I did here for all my bookmarks – but it requires more technical knowhow).

For klogging quotes for research purposes I also use Tumblr for Paul’s Literature Review. I’ve not used this as regularly or effectively as I could or should, but if I was embarking on a particularly large piece of research it would be particularly useful in keeping track of key passages in what I’m reading. Used in conjunction with a Kindle, it could be particularly powerful.

Back to the TVI bookmarks: another five minutes on Feedburner allowed me to set up a daily email newsletter of those bookmarks that students could subscribe to as well, and a further five minutes on Twitterfeed sent those bookmarks to a dedicated Twitter feed too (I could also have simply used Tumblr’s option to publish to a Twitter feed). ‘Blogging’ had moved beyond the blog.

Resource blogs – Tumblr and Posterous

For my Online Journalism module at City University London I use Tumblr to publish a curated, multimedia blog in addition to the Delicious bookmarks: Online Journalism Classes collects a limited number of videos, infographics, quotes and other resources for students. Tumblr was used because I knew most content would be instructional videos and I wanted a separate place to collect these.

The more general Paul Bradshaw’s Tumblelog (http://paulbradshaw.tumblr.com/) is where I maintain a collection of images, video, quotes and infographics that I look to whenever I need to liven up a presentation.

For resources based on notes or documents, however, Posterous is a better choice.

Python Notes and Notes on Spreadsheet Formulae and CAR, for example, both use Posterous as a simple way for me to blog my own notes on both (Python is a programming language) via a quick email (often drafted while on the move without internet access).

Posterous was chosen because it is very easy to publish and tag content, and I wanted to be able to access my notes based on tag (e.g. VLOOKUP) when I needed to remember how I’d used a particular formula or function.

Similarly, Edgbaston Election Campaign Exprenses and Hall Green Election Campaign Exprenses use Posterous as a quick way to publish and tag PDFs of election expense receipts from both constituencies (how this was done is explained here), allowing others to find expense details based on candidate, constituency, party or other details, and providing a space to post comments on findings or things to follow up.

Niche blogs – WordPress and Posterous

Although Online Journalism Blog began as ‘klogging’ it soon became something more, adding analysis, research, and contributions from other authors, and the number of users increased considerably. Blogger is not the most professional-looking of platforms, however (unless you’re prepared to do a lot of customisation), so I moved it to WordPress.com. And when I needed to install plugins for extra functionality I moved it again to a self-hosted WordPress site.

Finally, when the site was the victim of repeated hacking attempts I moved it to a WordPress MU (multi user) site hosted by Philip John’s Journal Local service, which provided technical support and a specialised suite of plugins.

If you want a powerful and professional-looking blogging platform it’s hard to beat WordPress.com, and if you want real control over how it works – such as installing plugins or customising themes – then a self-hosted WordPress site is, for me, your best option. I’d also recommend Journal Local if you want that combination of functionality and support.

If, however, you want to launch a niche blog quickly and functionality is not an issue then Posterous is an even better option, especially if there will be multiple contributors without technical skills. Council Coverage in Newspapers, for example, used Posterous to allow a group of people to publish the results of an investigation on my crowdsourced investigative journalism platform Help Me InvestigateThe Hospital Parking Charges Blog did the same for another investigation, but as it was only me publishing, I used WordPress.

Group blogs – Posterous and Tumblr

Posterous suits groups particularly well because members only need to send their post to a specific email address that you give them (such as post@yourblog.posterous.com) to be published on the blog.

It also handles multimedia and documents particularly well – when I was helping Podnosh‘s Nick Booth train a group of people with Flip cameras we used Posterous as an easy way for members of a group to instantly publish the video interviews they were doing by simply sending it to the relevant email address (Posterous will also cross-publish to YouTube and Twitter, simplifying those processes).

A few months ago Posterous launched a special ‘Groups’ service that publishes content in a slightly different way to make it easier for members to collaborate. I used this for another Help Me Investigate investigation - Recording Council Meetings – where each part of the investigation is a post/thread that users can contribute to.

Again, Posterous provides an easy way to do this – all people need to know is the email address to send their contribution to, or the web address where they can add comments to other posts.

If your contributors are more blog-literate and want to retain more control over their content, another option for group blogs is Tumblr. Brumblr, for example, is one group blog I belong to for Birmingham bloggers, set up by Jon Bounds. ‘We Love Michael Grimes‘ is another, set up by Pete Ashton, that uses Tumblr for people to post images of Birmingham’s nicest blogger.

Blogs for events – Tumblr, Posterous, CoverItLive

When I organised a Citizen Journalism conference in 2007, I used a WordPress blog to build up to it, write about related stories, and then link to reports on the event itself. Likewise, when later that year the NUJ asked me to manage a team of student members as they blogged that year’s ADM, I used WordPress for a group blog.

As the attendees of further events began to produce their own coverage, the platforms I chose evolved. For JEEcamp.com (no longer online), I used a self-hosted WordPress blog with an aggregation plugin that pulled in anything tagged ‘JEEcamp’ on blogs or Twitter. CoverItLive was also used to liveblog – and was then adopted successfully by attendees when they returned to their own news operations around the country (and also, interestingly, by Downing Street after they saw the tool being used for the event).

For the final JEEcamp I used Tumblr as an aggregator, importing the RSS feed from blog search engine Icerocket for any mention of ‘JEEcamp’.

In future I may experiment with the Posterous iPhone app’s new Events feature, which aggregates posts in the same location as you.

Aggregators – Tumblr

Sometimes you just want a blog to keep a record of instances of a particular trend or theme. For example, I got so sick of people asking “Is blogging journalism?” that I set up Is Ice Cream Strawberry?, a Tumblr blog that aggregates any articles that mention the phrases “Is blogging journalism”, “Are bloggers journalists” and “Is Twitter journalism” on Google News.

This was set up in the same way as detailed above, with the Feed URL box completed using the RSS feed from the relevant search on Google News or Google Blog Search (repeat for each feed).

Likewise, Online Journalism Jobs aggregates – you’ve got it – jobs in online journalism or that use online journalism skills. It pulls from the RSS feed for anything I bookmark on Delicious with the tag ‘ojjobs’ – but it can also be done manually with the Tumblr bookmark or email address, which is useful when you want to archive an entire job description that is longer than Delicious’s character limit.

Easy hyperlocal blogging – WordPress, Posterous and Tumblr

For a devoted individual hyperlocal blog WordPress seems the best option due to its power, flexibility and professionalism. For a hyperlocal blog where you’re inviting contributions from community members via email, Posterous may be better.

But if you want to publish a hyperlocal blog and have never had the time to do it justice, Tumblr provides a good way to make a start without committing yourself to regular, wordy updates. Boldmere High Street is my own token gesture – essentially a photoblog that I update from my mobile phone when I see something of interest – and take a photo – as I walk down the high street.

Personal blogs

As personal blogs tend to contain off-the-cuff observations, copies of correspondence or media, Posterous suits it well. Paul Bradshaw O/T (Off Topic) is mine: a place to publish things that don’t fit on any of the other blogs I publish. I use Posterous as it tends to be email-based, sometimes just keeping web-based copies of emails I’ve sent elsewhere.

It’s difficult to prescribe a platform for personal blogs as they are so… personal. If you talk best about your life through snatches of images and quotes, Tumblr will work well. I have a family Tumblr, for example, that pulls images and video from a family Flickr account, tweets from a family Twitter feed, video from a family YouTube account, and also allows me to publish snatches of audio or quotes.

You could use this to, for instance, create an approved-members-only Facebook page for the family so other family members can ‘follow’ their grandchildren, and publish updates from the Tumblr blog via RSS Graffiti. Facebook is, ultimately, the most popular personal blogging platform.

If it is hard to separate your personal life from your professional life, or your personal hobby involves playing with technology, WordPress may be a better choice.

And Blogger may be an easy way to bring together material from Google properties such as Picasa and Orkut.

Company blogs

Likewise, although Help Me Investigate’s blog started as two separate blogs on WordPress (one for company updates, the other for investigation tips), it now uses Posterous for both as it’s an easier way for multiple people to contribute.

This is because ease of publishing is more important than power – but for many companies WordPress is going to be the most professional and flexible option.

For some, Tumblr will best communicate their highly visual and creative nature. And for others, Posterous may provide a good place to easily publish documents and video.

Blogs – flexible enough for anything

What emerges from all the above is that blogs are just a publishing platform. There was a time when you had to customise WordPress, Typepad or Blogger to do what you wanted – from linkblogging and photoblogging to group blogs and aggregation. But those problems have since been solved by an increasing range of bespoke platforms.

Social bookmarking platforms and Twitter made it easier to linkblog; Tumblr made it easier to photoblog or aggregate RSS feeds. Posterous lowered the barrier to make group blogging as easy as sending an email. CoverItLive piggybacked on Twitter to aggregate live event coverage. And Facebook made bloggers of everyone without them realising.

A blog can now syndicate itself across multiple networks: Tumblr and Posterous make it easy to automatically cross-publish links and media to Twitter, YouTube and any other media-specific platform. RSS feeds can be pulled from Flickr, Delicious, YouTube or any of dozens of other services into a Facebook page or a WordPress widget.

What is important is not to be distracted by the technology, but focus on the people who will have to use it, and what they want to use it for.

To give a concrete example: I was once advising an organisation who wanted to publish their work online and help young people get their work out there. The young people used mobile phones (Blackberrys) and were on Facebook, but the organisation also wanted the content created by those young people to be seen by potential funders, in a professional context.

I advised them to:

  • Set up a moderated Posterous so that it would cross-publish to individuals’ Facebook pages (so there would be instant feedback for those users rather than it be published in an isolated space online that their friends had to go off and find);
  • Give the Posterous blog email address to the young people so they could use it to send in their work (making it easy to use on a device they were comfortable with);
  • Then to set up a separate ‘official’ WordPress site that pulled in the Posterous feed into a side-widget alongside the more professional, centrally placed, content (meeting the objectives of the organisation).

This sounds more technically complex than it is in practice, and the key thing is that it makes publishing as easy as possible: for the young users of the service, they only had to send images and comments to an email address. For members of the organisation they only had to write blog posts. Everything else, once set up, was automated. And free.

Many people hesitate before blogging, thinking that their effort has to be right first time. It doesn’t. Going through these blogs I counted around 35 that I’ve either created or been involved in. Many of those were retired when they ceased to be useful; some were transferred to new platforms. Some changed their names, some were deleted. Increasingly, they are intended from the start to have a limited shelf life. But every one has taught me something.

And those are just my experiences – how have you used blogs in different ways? And how has it changed?


April 11 2011


Data for journalists: understanding XML and RSS

If you are working with data chances are that sooner or later you will come across XML – or if you don’t, then, well, you should do. Really.

There are some very useful resources in XML format – and in RSS, which is based on XML – from ongoing feeds and static reference files to XML that is provided in response to a question that you ask. All of that is for future posts – this post attempts to explain how XML is relevant to journalism, and how it is made up.

What is XML?

XML is a language which is used for describing information, which makes it particularly relevant to journalists – especially when it comes to interrogating large sets of data.

If you wanted to know how many doctors were privately educated, or what the most common score was in the Premiership last season, or which documents were authored by a particular civil servant, then XML may be useful to you.

(That said, this post doesn’t show you how to do any of that – it is mainly aimed at explaining how XML works so that you can begin to think about those possibilities.)

XML stands for “eXtensible Markup Language”. It’s the ‘markup’ bit which is key: XML ‘marks up’ information as being something in particular: relating to a particular date, for example; or a particular person; or referring to a particular location.

For example, a snippet of XML like this -


- tells you that the ‘Paris’ in this instance is a city, rather than a celebrity. And that it’s in France, not Texas.

That makes it easier for you to filter out information that isn’t relevant, or combine particular bits of information with data from elsewhere.

For example, if an XML file contains information on authors, you can filter out all but those by the person you’re interested in; if it contains publication dates, you can use that to plot associated content on a timeline.

Most usefully, if you have a set of data yourself such as a spreadsheet, you can pull related data from a relevant XML file. If your spreadsheet contains football teams and the XML provides locations, images, and history for each, then you can pull that in to create a fuller picture. If it contains addresses, there are services that will give you XML files with the constituency for those postcodes.

What is RSS?

RSS is a whole family of formats which are essentially based on XML – so they are structured in the same way, containing ‘markup’ that might tell you the author, publication date, location or other details about the information it relates to.

There is a lot of variation between different versions of RSS, but the main thing for the purposes of this post is that the various versions of RSS, and XML, share a structure which journalists can use if they know how to.

Which version isn’t particularly important: as long as you understand the principles, you can adapt what you do to suit the document or feed you’re working with.

Looking at XML and RSS

XML documents (for simplicity’s sake I’ll mostly just refer to ‘XML’ for the rest of this post, although I’m talking about both XML and RSS) contain two things that are of interest to us: content, and information about the content (‘markup’).

Information about the content is contained within tags in angle brackets (also known as chevrons): ‘<’ and ‘>’

For example: <name> or <pubDate> (publication date).

The tag is followed by the content itself, and a closing tag that has a forward slash, e.g. </name> or </pubDate>, so one line might look like this:

<name>Paul Bradshaw</name>

At this point it’s useful to have some XML or RSS in front of you. For a random example go to the RSS feed for the Scottish Government News.

To see the code right-click on that page and select View Source or similar – Firefox is worth using if another browser does not work; the Firebug extension also helps. (Note: if the feed is generated by Feedburner this won’t work: look for the ‘View Feed XML‘ button in the middle right area or add ?format=xml to the feed URL).

What you should see will include the following:

<title>Manufactured Exports Q4 2010</title>
<description>A National Statistics publication for Scotland.</description>
<guid isPermaLink="true">http://www.scotland.gov.uk/News/Releases/2011/04/06100351</guid>
<pubDate>Wed, 06 Apr 2011 00:00:00 GMT</pubDate>

In the RSS feed itself this doesn’t start until line 14 (the first 13 lines are used to provide information about the feed as a whole, such as the version of RSS, title, copyright etc).

But from line 14 onwards this pattern repeats itself for a number of different ‘items’.

As you can see, each item has a title, a link, a description, a permalink, and a publication date. These are known as child elements (the item is the parent, or the ‘root element’).

More journalistic examples can be found at Mercedes GP’s XML file of the latest F1 Championship Standings (see the PS at the end of Tony Hirst’s post for an explanation of how this is structured), and MySociety’s Parliament Parser, which provides XML files on all parts of government, from MPs and peers to debates and constituencies, going back over a decade. Look at the Ministers XML file in Firefox and scroll down until you get to the first item tagged <ministerofficegroup>. Within each of those are details on ministerial positions. As the Parliament Parser page explains:

“Each one has a date range, the MP or Lord became a minister at some time on the start day, and stopped being one at some time on the end day. The matchid field is one sample MP or Lord office which that person also held. Alternatively, use the people.xml file to find out which person held the ministerial post.”

You’ll notice from that quote that some parts of the XML require cross-referencing to provide extra details. That’s where XML becomes very useful.

Using it in practice: working with XML in Yahoo! Pipes

Yahoo! Pipes provides a good introduction in working with data in XML or RSS. You’ll need to sign up at Pipes.Yahoo.com and click on ‘Create a Pipe‘.

You’ll now be editing a new project. On the left hand column are various ‘modules’ you can use. Click on ‘Sources‘ to expand it, and click and drag ‘Fetch Feed’ onto the graph paper-style canvas.

The 'Fetch Feed' module
The ‘Fetch Feed’ module

Copy the address of your RSS feed and paste it into the ‘Fetch Feed’ box. I’m using this feed of Health information from the UK government.

If you now click on the module so that it turns orange, you should be able (after a few moments) see that feed in the Debugger window at the bottom of the screen.

Click on the handle in the middle to pull it up and see more, and click on the arrows on the left to drill down to the ‘nested’ data within each item.

Drilling down into the data within an RSS feed
Drilling down into the data within an RSS feed

As you drill down you can see elements of data you can filter. In this case, we’ll use ‘region‘.

To filter the feed based on this we need the Filter module. On the left hand side click on ‘Operators‘ to expand that, and then drag the ‘Filter‘ module into the canvas.

Now drag a pipe from the circle at the bottom of the ‘Fetch Feed’ module to the top of the ‘Filter’ module.

Drag a pipe from Fetch Feed to Filter
Drag a pipe from Fetch Feed to Filter

Wait a moment for the ‘Filter’ module to work out what data the RSS feed contains. Then use the drop down menus so that it reads “Permit items that match all of the following”.

The next box determines which piece of data you will filter on. If you click on the drop-down here you should see all the pieces of data that are associated with each item.

Select the data you are filtering on
Select the data you are filtering on

We’re going to select ‘region’, and say that we only want to permit items where ‘region’ contains ‘North West’. If any of these don’t make any sense, look at the original RSS feed again to see what they contain.

Now drag a final pipe from the bottom of the ‘Filter’ module to the top of ‘Pipe output‘ at the bottom of the canvas. If you click on either you should be able to see in the Debugger that now only those items relating specifically to the North West are displayed.

If you wanted to you could now save this and click ‘Run Pipe‘ to see the results. Once you do you should notice options to ‘Get as RSS‘ – this would allow you to subscribe to this feed yourself or publish it on a website or Twitter account. There’s also ‘Get as JSON’ which is a whole other story – I’ll cover JSON in a future post.

You can see this pipe in action – and clone it yourself – here.

Oh, and a sidenote: if you wanted to grab an XML file in Yahoo! Pipes rather than an RSS feed, you would use ‘Fetch Data’ instead of ‘Fetch Feed’.

Just the start

There’s much more you can do here. Some suggestions for next steps:

Those are for future posts. For now I just want to demonstrate how XML works to add information-about-information which you can then use to search, filter, and combine data.

And it’s not just an esoteric language that is used by a geeky few as part of their newsgathering: journalists at Sky News, The Guardian and The Financial Times – to name just a few – all use this as a routine part of publishing, because it provides a way to dynamically update elements within a larger story without having to update the whole thing from scratch – for example by updating casualty numbers or new dates on a timeline.

And while I’m at it, if you have any examples of XML being used in journalism for either newsgathering or publishing, let me know.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
Get rid of the ads (sfw)

Don't be the product, buy the product!