Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 27 2011

14:46

#newsrw: Making sense of the numbers in data journalism

The next big developments in data journalism is live data and also getting your audience involved, according the Martin Stabe, an interactive producer of FT.com.

He was one of four data journalists giving tips on what is in the data journalism toolkit and advice on tools, many of which are free and how to find the data and clean it.

James Ball, data journalist, the Guardian investigations team, worked on the WikiLeaks cables and  discussed the “use and abuse of statistics”.

He showed “a really awful infographic” on the amount of water it takes to make a pizza and a slice of bread.

“You don’t have to do must research to realise that is just tosh,” he said.

“We have to sense-check numbers.” He gave the example of culture secretary Jeremy Hunt giving expected TV viewers for the Royal Wedding of the unrealistic figure of two billion. The estimated audience was 300 million.

He asked: “Why might it matter?” And explained the dangers of bad statistics and bad journalism. “The best bit of your toolkit is understanding a bit of maths,” he advised.

Kevin Anderson, data journalism trainer and digital strategist, trained as a journalist in US, gave more tips on tools. One of the revolutions is access to data, the other is the access to tools, he said.

One tool in his kit is Google Docs. Google Spreadsheets, which Anderson used when he was at the Guardian and recommended the OUseful blog.

“You can import data live data feed,” he said, and suggests collecting your own data in a form. You can ask questions, including multiple choice, and embed the form it into a story.

For easy mapping tools he advises Google and Zeemaps. Once you have the data he said the next process is “link scraping”.

You can “grab data” from existing sources. He gives an example of using Outwit Hub, a plugin which works with Firefox, which allows you to pull in links, with the URLs, from any search and then export it as a Google Spreadsheet or SQL.

Anderson also recommends tools to order data from text. He gives the example of OpenCalais, a Thomson Reuters tool, which “allows you to see patterns in your own coverage” and connections between stories.

He also pointed journalists towards ThinkMap and gave the example of ‘Who Runs Hong Kong’, a data visualisation showing the connections of power.

“The ability for news organisations to extract more value through data journalism is a huge opportunity,” he said.

Martin Stabe, interactive producer, FT.com, who, like Anderson, is orginally from the states, described how data-driven news stories at FT.com are handled by a team.

He explained the team consists of a reporter, “who really knows the story”, a producer, like Stabe, a designer and a developer.

“One of the best things you can do in your newsroom is to get your head round administrative geography,” he said and understand statistical data.

He said it is very difficult to get data on all local authorities, on when they hold local elections and how their public spending is changing. Local data is often coded in different ways, he explained and gave the example of the “Cuts of £6bn hit the elderly the hardest” report on FT.com.

When you have a large dataset you need to ask questions. But data maybe “dirty” with a mix of local of coding conventions.

“The very act of cleaning the data is the key step,” he said.

“Data is only useful if it’s personal”, Bella Hurrell from the BBC recently said on Paul Bradshaw’s blog, a quote echoed by Stabe, giving the example on data collected on how likely a 16-year-old receiving free school lunches will get good or bad GCSE results.

He pointed out that readers are usually only interested in one area, one school, so an interactive version allows people to drill down. The data journalism steps are to obtain, warehouse and publish the data.

In obtaining the data, “sometimes we ask for it nicely” Stabe said, but usually the FT scrapes the data, and it then goes into a database.

His tips for journalists include learning how to manipulate text in Excel.

Next came advice from Simon Rogers, editor of the Guardian’s Data Store and Datablog.

Newspapers are all about the geography of the newsroom, he said, describing how he sits beside the investigations team and news desk.

He spoke about the difficulty in getting usable public data and dealing with the government’s “annual resource accounts”.

The Guardian is now providing ordered data to the people in government who supplied it, he explained.

The Guardian’s data work flow is: getting sent data, data from breaking news, recurring events and “theories to be exploited”. The journalists then have to think about how to “mash it together”, as the combined data makes it more inetersting.

A couple of Rogers’ tips are to use ManyEyes, Google Spreadsheets but “sometimes numbers alone are interesting enough to make a story,” he said.

He gave the example of a map made using Google Fusion Tables showing “patterns of awfulness” every death in Iraq mapped – which took about half an hour.

More recent examples include accessing data provided on the Nato Libya website. The site produces a daily archive for what happens each day, including data on missions.

Every day they add the NATO data to a map to show visually what has been hit where. It can also make stories as journalists notice patterns.

10:46

LIVE: Session 1A – The data journalism toolkit

We have Matt Caines and Ben Whitelaw from Wannabe Hacks liveblogging for us at news:rewired all day. You can follow session 1A ‘The data journalism toolkit’, below.

Session 1A features: Kevin Anderson, data journalism trainer and digital strategist; James Ball, data journalist, Guardian investigations team Martin Stabe, interactive producer, FT.com. Simon Rogers; editor, Guardian datablog and datastore. Moderated by David Hayward, head of journalism programme, BBC College of Journalism.

news:rewired – Session 1A: The data journalism kit

December 16 2010

15:04

LIVE: The digital production desk

We’ll have Matt Caines and Nick Petrie from Wannabe Hacks liveblogging for us at news:rewired all day. Follow individual posts on the news:rewired blog for up to date information on all our sessions.

We’ll also have blogging over the course of the day from freelance journalist Rosie Niven.

October 22 2010

11:00

Help Me Investigate – anatomy of an investigation

Earlier this year I and Andy Brightwell conducted some research into one of the successful investigations on my crowdsourcing platform Help Me Investigate. I wanted to know what had made the investigation successful – and how (or if) we might replicate those conditions for other investigations.

I presented the findings (presentation embedded above) at the Journalism’s Next Top Model conference in June. This post sums up those findings.

The investigation in question was ‘What do you know about The London Weekly?‘ – an investigation into a free newspaper that was (they claimed – part of the investigation was to establish if this was a hoax) about to launch in London.

The people behind the paper had made a number of claims about planned circulation, staffing and investment that most of the media reported uncritically. Martin Stabe, James Ball and Judith Townend, however, wanted to dig deeper. So, after an exchange on Twitter, Judith logged onto Help Me Investigate and started an investigation.

A month later members of the investigation had unearthed a wealth of detail about the people behind The London Weekly and the facts behind their claims. Some of the information was reported in MediaWeek and The Media Guardian podcast Media Talk; some formed the basis for posts on James Ball’s blog, Journalism.co.uk and the Online Journalism Blog. Some has, for legal reasons, remained unpublished.

A note on methodology

Andrew conducted a number of semi-structured interviews with contributors to the investigation. The sample was randomly selected but representative of the mix of contributors, who were categorised as either ‘alpha’ contributors (over 6 contributions), ‘active’ (2-6 contributions) and ‘lurkers’ (whose only contribution was to join the investigation). These interviews formed the qualitative basis for the research.

Complementing this data was quantitative information about users of the site as a whole. This was taken from two user surveys – one when the site was 3 months’ old and another at 12 months – and analysis of analytics taken from the investigation (such as numbers and types of actions, frequency, etc.)

What are the characteristics of a crowdsourced investigation?

One of the first things I wanted to analyse was whether the investigation data matched up to patterns observed elsewhere in crowdsourcing and online activity. An analysis of the number of actions by each user, for example, showed a clear ‘power law’ distribution, where a minority of users accounted for the majority of activity.

This power law, however, did not translate into a breakdown approaching the 90-9-1 ‘law of participation inequality‘ observed by Jakob Nielsen. Instead, the balance between those who made a couple of contributions (normally the 9% of the 90-9-1 split) and those who made none (the 90%) was roughly equal. This may have been because the design of the site meant it was not possible to ‘lurk’ without being a member of the site already, or being invited and signing up.

Adding in data on those looking at the investigation page who were not members may have shed further light on this.

What made the crowdsourcing successful?

Clearly, it is worth making a distinction between what made the investigation successful as a series of outcomes, and what made crowdsourcing successful as a method.

What made the community gather, and continue to return? One hypothesis was that the nature of the investigation provided a natural cue to interested parties – The London Weekly was published on Fridays and Saturdays and there was a build up of expectation to see if a new issue would indeed appear.

I was curious to see if the investigation had any ‘rhythm’. Would there be peaks of interest correlating to the expected publication?

The data threw up something else entirely. There was indeed a rhythm but it was Wednesdays that were the most popular day for people contributing to the investigation.

Why? Well, it turned out that one of the investigation’s ‘alpha’ contributors – James Ball – set himself a task to blog about the investigation every week. His blog posts appeared on a Wednesday.

That this turned out to be a significant factor in driving activity tells us one important lesson: talking publicly and regularly about the investigation’s progress is key.

This data was backed up from the interviews. One respondent mentioned the “weekly cue” explicitly.

More broadly, it seems that the site helped keep track of a number of discussions taking place around the web. Having been born from a discussion on Twitter, further conversations on Twitter resulted in further people signing up, along with comments threads and other online discussion. This fit the way the site was designed culturally – to be part of a network rather than asking people to do everything on-site.

But the planned technical connectivity of the site with the rest of the web (being able to pull related tweets or bookmarks, for example) had been dropped during development as we focused on core functionality. This was not a bad thing, I should emphasise, as it prevented us becoming distracted with ‘bells and whistles’ and allowed us to iterate in reaction to user activity rather than our own assumptions of what users would want. This research shows that user activity and informs future development accordingly.

The presence of ‘alpha’ users like James and Judith was crucial in driving activity on the site – a pattern observed in other successful investigations. They picked up the threads contributed by others and not only wove them together into a coherent narrative that allowed others to enter more easily, but also set the new challenges that provided ways for people to contribute. The fact that they brought with them a strong social network presence is probably also a factor – but one that needs further research.

The site has always been designed to emphasise the role of the user in driving investigations. The agenda is not owned by a central publisher, but by the person posing the question – and therefore the responsibility is theirs as well. In this sense it draws on Jenkins’ argument that “Consumers will be more powerful within convergence culture – but only if they recognise and use that power.” This cultural hurdle may be the biggest one that the site has to address.

Indeed, the site is also designed to offer “Failure for free”, allowing users to learn what works and what doesn’t, and begin to take on that responsibility where required.

The investigation also suited crowdsourcing well, as it could be broken down into separate parts and paths – most of which could be completed online: “Where does this claim come from?” “Can you find out about this person?” “What can you discover about this company?”. One person, for example, used Google Streetview to establish that the registered address of the company was a postbox.

Other investigations that are less easily broken down may be less suitable for crowdsourcing – or require more effort to ensure success.

A regular supply of updates provided the investigation with momentum. The accumulation of discoveries provided valuable feedback to users, who then returned for more. In his book on Wikipedia, Andrew Lih (2009 p82) notes a similar pattern – ‘stigmergy‘ – that is observed in the natural world: “The situation in which the product of previous work, rather than direct communication [induces and directs] additional labour”. An investigation without these ‘small pieces, loosely joined’ might not suit crowdsourcing so well.

One problem, however, was that those paths led to a range of potential avenues of enquiry. In the end, although the core questions were answered (was the publication a hoax and what were the bases for their claims) the investigation raised many more questions.

These remained largely unanswered once the majority of users felt that their questions had been answered. Like any investigation, there came a point at which those involved had to make a judgement whether they wished to invest any more time in it.

Finally, the investigation benefited from a diverse group of contributors who contributed specialist knowledge or access. Some physically visited stations where the newspaper was claiming distribution to see how many copies were being handed out. Others used advanced search techniques to track down details on the people involved and the claims being made, or to make contact with people who had had previous experiences with those behind the newspaper.

The visibility of the investigation online led to more than one ‘whistleblower’ approach providing inside information.

What can be done to make it better?

Looking at the reasons that users of the site as a whole gave for not contributing to an investigation, the majority attributed this to ‘not having enough time’. Although at least one interviewee, in contrast, highlighted the simplicity and ease of contributing, it needs to be as easy and simple as possible for users to contribute in order to lower the perception of effort and time needed.

Notably, the second biggest reason for not contributing was a ‘lack of personal connection with an investigation’, demonstrating the importance of the individual and social dimension of crowdsourcing. Likewise, a ‘personal interest in the issue’ was the single largest factor in someone contributing. A ‘Why should I contribute?’ feature on each investigation may be worth considering.

Others mentioned the social dimension of crowdsourcing – the “sense of being involved in something together” – what Jenkins (2006) would refer to as “consumption as a networked practice”.

This motivation is also identified by Yochai Benkler in his work on networks. Looking at non-financial reasons why people contribute their time to online projects, he refers to “socio-psychological reward”. He also identifies the importance of “hedonic personal gratification”. In other words, fun. (Interestingly, these match two of the three traditional reasons for consuming news: because it is socially valuable, and because it is entertaining. The third – because it is financially valuable – neatly matches the third reason for working).

While it is easy to talk about “Failure for free”, more could be done to identify and support failing investigations. We are currently developing a monthly update feature that would remind users of recent activity and – more importantly – the lack of activity. The investigators in a group might be asked whether they wish to terminate the investigation in those cases, emphasising their role in its progress and helping ‘clean up’ the investigations listed on the first page of the site.

That said, there is also a danger is interfering too much in reducing failure. This is a natural instinct, and I have to continually remind myself that I started the project with an expectation of 95-99% of investigations ‘failing’ through a lack of motivation on the part of the instigator. That was part of the design. It was the 1-5% of questions that gained traction that would be the focus of the site (this is how Meetup works, for example – most groups ‘fail’ but there is no way to predict which ones. As it happens, the ‘success’ rate of investigations has been much higher than expected). One analogy is a news conference where members throw out ideas – only a few are chosen for investment of time and energy, the rest ‘fail’.

In the end, it is the management of that tension between interfering to ensure everything succeeds – and so removing the incentive for users to be self-motivated – and not interfering at all – leaving users feeling unsupported and unmotivated – that is likely to be the key to a successful crowdsourcing project. More than a year into the project, this is still a skill that I am learning.

July 29 2010

10:47

The New Online Journalists #7: Dave Lee

As part of an ongoing series on recent graduates who have gone into online journalism, Dave Lee talks about how he won a BBC job straight from university, what it involves, and what skills he feels online journalists need today.

I got my job as a result – delightfully! – of having a well-known blog. Well, that is, well-known in the sense it was read by the right people. My path to the BBC began with a work placement at Press Gazette – an opportunity I wouldn’t have got had it not been for the blog. In fact, I recall Patrick Smith literally putting it in those terms – saying that they’d never normally take an undergrad without NUJ qualifications – but they’d seen my blog and liked what I was doing.

I met Martin Stabe there, and worked closely with him on a couple of projects – including the Student Journalism Blog on their site.

Martin knew Nick Reynolds – social media executive at the BBC – and when he heard a blogger was needed for the BBC Internet Blog, my name was passed on. That door into the BBC then made it much easier to progress upwards to the newsroom.

My job is to write news and features for BBC News Online, based on output from the BBC World Service.

There wasn’t much in my course [at Lincoln University] which directly relates to the skills I use now – much has been learnt on the job – but there is a certain level of law knowledge, ethics and general good practice that has proved to be invaluable – and that came from my studies.

Of course, it’s always worth stressing that my blog was able to succeed because of my flexibility to write about my studies and people met via work at my university. So while studying didn’t perhaps give me the practical skills for my day-to-day job, it certainly has helped me be a good journalist in other, less measurable ways.

It’s hard to predict how my job will develop in the future. Within the BBC, it’s pretty crucial when making sure we share our best stuff – it’s not good having two sets of BBC journos (or more…) running after the same stories and sources. Jobs like mine help solve that situation.

10:05

The New Online Journalists #7: Dave Lee

As part of an ongoing series on recent graduates who have gone into online journalism, Dave Lee talks about how he won a BBC job straight from university, what it involves, and what skills he feels online journalists need today.

I got my job as a result – delightfully! – of having a well-known blog. Well, that is, well-known in the sense it was read by the right people. My path to the BBC began with a work placement at Press Gazette – an opportunity I wouldn’t have got had it not been for the blog. In fact, I recall Patrick Smith literally putting it in those terms – saying that they’d never normally take an undergrad without NUJ qualifications – but they’d seen my blog and liked what I was doing.

I met Martin Stabe there, and worked closely with him on a couple of projects – including the Student Journalism Blog on their site.

Martin knew Nick Reynolds – social media executive at the BBC – and when he heard a blogger was needed for the BBC Internet Blog, my name was passed on. That door into the BBC then made it much easier to progress upwards to the newsroom.

My job is to write news and features for BBC News Online, based on output from the BBC World Service.

There wasn’t much in my course [at Lincoln University] which directly relates to the skills I use now – much has been learnt on the job – but there is a certain level of law knowledge, ethics and general good practice that has proved to be invaluable – and that came from my studies.

Of course, it’s always worth stressing that my blog was able to succeed because of my flexibility to write about my studies and people met via work at my university. So while studying didn’t perhaps give me the practical skills for my day-to-day job, it certainly has helped me be a good journalist in other, less measurable ways.

It’s hard to predict how my job will develop in the future. Within the BBC, it’s pretty crucial when making sure we share our best stuff – it’s not good having two sets of BBC journos (or more…) running after the same stories and sources. Jobs like mine help solve that situation.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl