Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 03 2013

22:05

Intercontinental collaboration: How 86 journalists in 46 countries can work on a single investigation

piggy-bank-offshore-banking-beach

On Thursday morning, the International Consortium of Investigative Journalists will begin releasing detailed reports on the workings of offshore tax havens. A little over a year ago, 260 gigabytes of data were leaked to ICIJ executive dIrector Gerard Ryle; they contained information about the finances of individuals in over 170 countries.

Ryle was a media executive in Australia at the time he received the data, says deputy director Marina Walker Guevara. “He came with the story under his arm.” Walker Guevara says the ICIJ was surprised Ryle wanted a job in their small office in Washington, but soon realized that it was only through their international scope and experience with cross border reporting that the Offshore Project could be executed. The result is a major international collaboration that has to be one of the largest in journalism history.

“It was a huge step. As reporters and journalists, the first thing you think is not ‘Let me see how I can share this with the world.’ You think: ‘How can I scoop everyone else?’ The thinking here was different.” Walker Guevara says the ICIJ seriously considered keeping the team to a core five or six members, but ultimately decided to go with the “most risky” approach when they realized the enormous scope of the project: Journalists from around the world were given lists of names to identify and, if they found interesting connections, were given access to Interdata, the secure, searchable, online database built by the ICIJ.

Just as the rise of information technology has allowed new competition for the attention of audiences, it’s also enabled traditional news organizations to partner in what can sometimes seem like dizzyingly complex relationships. The ICIJ says this is the largest collaborative journalism project they have ever organized, with the most comparable involving a team of 25 cross border journalists.

In the end, the Offshore Project brings together 86 journalists from 46 countries into an ongoing reporting collaboration. German and Canadian news outlets (Süddeutsche Zeitung, Norddeutscher Rundfunk, and the CBC) will be among the first to report their findings this week, with The Washington Post beginning their report on April 7, just in time for Tax Day. Reporters from more than 30 other publications also contributed, including Le Monde, the BBC and The Guardian. (The ICIJ actually published some preliminary findings in conjunction with the U.K. publications as a teaser back in November.)

“The natural step wasn’t to sit in Washington and try to figure out who is this person and why this matters in Azerbaijan or Romania,” Walker Guevara said, “but to go to our members there — or a good reporter if we didn’t have a member — give them the names, invite them into the project, see if the name mattered, and involve them in the process.”

Defining names that matter was a learning experience for the leaders of the Offshore Project. Writes Duncan Campbell, an ICIJ founder and current data journalism manager:

ICIJ’s fundamental lesson from the Offshore Project data has been patience and perseverance. Many members started by feeding in lists of names of politicians, tycoons, suspected or convicted fraudsters and the like, hoping that bank accounts and scam plots would just pop out. It was a frustrating road to follow. The data was not like that.

The data was, in fact, very messy and unstructured. Between a bevy of spreadsheets, emails, PDFs without OCR, and pictures of passports, the ICIJ still hasn’t finished mining all the data from the raw files. Campbell details the complicated process of cleaning the data and sorting it into a searchable database. Using NUIX software licenses granted to the ICIJ for free, it took a British programmer two weeks to build a secure database that would allow all of the far-flung journalists not only to safely search and download the documents, but also to communicate with one another through an online forum.

“Once we went to these places and gathered these reporters, we needed to give them the tools to function as a team,” Walker Guevara said.

Even so, some were so overwhelmed by the amount of information available, and so unaccustomed to hunting for stories in a database, that the ICIJ ultimately hired a research manager to do searches for reporters and send them the documents via email. “We do have places like Pakistan where the reporters didn’t have much Internet access, so it was a hassle for him,” says Walker Guevara, adding that there were also security concerns. “We asked him to take precautions and all that, and he was nervous, so I understand.”

They also had to explain to each of the reporting teams that they weren’t simply on the lookout for politicians hiding money and people who had broken the law. “First, you try the name of your president. Then, your biggest politician, former presidents — everybody has to go through that,” Walker Guevara says. While a few headline names did eventually appear — Imelda Marcos, Robert Mugabe — she says some of the most surprising stories came from observing broader trends.

“Alongside many usual suspects, there were hundreds of thousands of regular people — doctors and dentists from the U.S.,” she says, “It made us understand a system that is a lot more used than what you think. It’s not just people breaking the law or politicians hiding money, but a lot of people who may feel insecure in their own countries. Or hiding money from their spouses. We’re actually writing some stories about divorce.”

In the 2 million records they accessed, ICIJ reporters began to get an understanding of the methods account holders use to avoid association with these accounts. Many use “nominee directors,” a process which Campbell says is similar to registering a car in the name of a stranger. But in their post about the Offshore Project, the ICIJ team acknowledges that, to a great extent, most of the money being channeled through offshore accounts and shell companies is actually not being used for illegal transactions. Defenders of the offshore banks say they “allow companies and individuals to diversify their investments, forge commercial alliances across national borders, and do business in entrepreneur-friendly zones that eschew the heavy rules and red tape of the onshore world.”

Walker Guevara says that, while that can be true, the “parallel set of rules” that governs the offshore world so disproportionately favor the elite, wealthy few as to be unethical. “Regulations, bureaucracy, and red tape are bothersome,” she says, “but that’s how democracy works.”

Perhaps the most interesting question surrounding the Offshore Project, however, is how do you get traditional shoe-leather journalists up to speed on an international story that involves intensive data crunching. Walker Guevara says it’s all about recognizing when the numbers cease to be interesting on their own and putting them in global context. Ultimately, while it’s rewarding to be able to trace dozens of shell companies to a man accused of stealing $5 billion from a Russian bank, someone has to be able to connect the dots.

“This is not a data story. It was based on a huge amount of data, but once you have the name and you look at your documents, you can’t just sit there and write a story,” says Walker Guevara. “That’s why we needed reporters on the ground. We needed people checking courthouse records. We needed people going and talking to experts in the field.”

All of the stories that result from the Offshore Project — some of which could take up to a year to be published — will live on a central project page at ICIJ.org. The team is also considering creating a web app that will allow users to explore some (though probably not all) of the data. In terms of the unique tools they built, Walker Guevara says most are easily replicable by anyone using NUIX or dtSearch software, but they won’t be open sourced. Other lessons from the project, like the inherent vulnerability of PGP encryption and “other complex cryptographic systems popular with computer hackers,” will endure.

“I think one of the most fascinating things about the project was that you couldn’t isolate yourself. It was a big temptation — the data was very addictive,” Walker Guevara says. “But the story worked because there was a whole other level of traditional reporting that was going and checking public records, going and seeing — going places.”

Photo by Aaron Shumaker used under a Creative Commons license.

October 25 2010

18:25

5 Ways to Improve the Non-Profit Journalism Hub

The Voice of San Diego, one of the oldest of the new guard of non-profit news orgs that have been popping up, has teamed up with some academics from San Diego State University to launch The Hub, a handy database of information about non-profit community news organizations. If you're looking to start your own non-profit news org or want to learn more about what's already out there, this is the place to start. 

Megan Garber over at NiemanLab has a detailed rundown on the who's and what's involved.


I'm a big fan of things that solve problems, and The Hub clearly does that. Voice of San Diego CEO Scott Lewis told Garber the site was created in response to "many, many occasions in which VOSD execs and editors found themselves fielding requests for consulting and advice from people hoping to start their own non-profit news sites."

I spent some time cruising around and think it shows a lot of promise. I've also got five ideas for how it could be made even better and more useful.

Inside The Hub

The piece that I'm most interested in is the simple directory of existing non-profit news orgs that The Hub has put into motion. This is a great idea. Structured directories are almost always awesome. The Hub's directory is pretty simple, currently listing just 13 organizations that qualify as non-profit, community-based news organizations. All the big players you usually read about in stories are there: New Haven Independent, Texas Tribune, Bay Citizen, etc. Each profile page includes a quick rundown on the org's background and then a short Q and A with someone from the organization answering basic questions about its goals and origins.

It might not sound like much, but this is really useful stuff for people looking to learn more about this area. That said, there are a few ways these profiles could be improved on to make the site as a whole much more useful:

  1. More structured data -- I'd love to see The Hub focus more on structured data over narrative. The interviews I read were fairly interesting, but the ability to take in all the important details about an organization at a glance is more valuable than the ability to read a Q & A that may or may not contain the same information. What I'd love to see would be for The Hub to borrow a page from CrunchBase in how all the data is structured and links to clickable search results. An emphasis on getting more structured data would be a bigger win than getting more narrative info on these profile pages.
  2. Funding information -- The biggest piece of structured data missing is the funding for each organization. As a reader, I want to know how much funding each news org has received so far and what the source of it is. From my own reading, I know that there's a vast disparity in funding levels between some of these organizations. Visitors need to be able to see this at a glance so they can put the rest of the information into the proper context.
  3. Rundown on key personnel -- Similarly, the structured data for each news org should include the names of the top editors and the publisher of each organization. These pages could link to "people" pages on The Hub, or they could just link out to LinkedIn profiles or Twitter accounts. Either way, people will want to know who's in charge at these news orgs so they can get a better sense of what they're doing and how they're doing it.
  4. Subscriber/follower counts for social media accounts -- The Hub's profile pages helpfully link out to the social media accounts for each news organization. What they don't tell you, however, is how many followers that news organization has right now. This might seem like a small thing, but it could actually be very useful information if acquired automatically. It would be great to be able to rank non-profit news orgs based on how many followers they have on Twitter, or by number of fans on Facebook, for example.
  5. Info on how freelancers can pitch them and how interested parties can support them -- My final suggestion would be for The Hub's profile pages to prominently include information aimed at freelancers looking to learn more about how to pitch non-profit news organizations and for fans and avid readers looking for how to support these new enterprises and their work. These are two use cases I think will be pretty common among visitors to The Hub and they don't appear to be addressed specifically on the profile pages.

The Hub is a useful project off to a great start. People working on the edges of journalism need more projects like these that give shape and voice to what's happening in the field. I look forward to seeing how this develops.

July 28 2010

10:41

BBC moves to more structured data in its relaunch

code behind BBC pages

Behind the story of the BBC website’s recent relaunch is, among other things, an update to their content management system. In a post on the changes, John O’Donovan explains how the changes mean that webpages will have a more structured and semantic quality:

“We will … no longer be using tables to layout the content, instead we will be rendering the pages using CSS layout and only using tables for data.

“There are lots of reasons to do this, but some include making the content more efficient, more standards compliant and faster to render. It also allows us to publish semantic XHTML, which means that content blocks are better marked up to describe what they are and has benefits like creating a better header structure to help screen readers.

“Better structure also means you will see a more consistent presentation of stories in Google and search engines with, for example, story dates and author information showing more clearly.

“This reflects a new content model which is now largely based around a simple and generic data model of assets and groups of assets which are typed (meaning we don’t just manage blocks of content, we use metadata to describe what is in the blocks of content) and publishing through templates and services based around Velocity.”

In addition code that now looks like the image above will mean that the site is better search engine optimised (as if a PageRank of 9 wasn’t good enough), more accessible, and it will be easier for developers to do interesting things with BBC content.

On the subject of SEO the site is simplifying URLs but still won’t be including descriptive words there – but “there is more work to do yet on how we might use even shorter URLs (such as http://www.bbc.co.uk/10250603) and longer more descriptive ones http://www.bbc.co.uk/story-about-something-interesting.”

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl