Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

July 01 2013

14:57

Monday Q&A: Denise Malan on the new data-driven collaboration between INN and IRE

Every news organization wishes it could have more reporters with data skills on staff. But not every news organization can afford to make data a priority — and even those that do can sometimes find the right candidates hard to find.

A new collaboration between two journalism nonprofits — the Investigative News Network and Investigative Reporters and Editors — aims to address this allocation issue. Denise Malan, formerly a investigative and data reporter at the Corpus Christi Caller-Times, will fill the new role of INN director of data services, offering “dedicated data-analysis services to INN’s membership of more than 80 nonprofit investigative news organizations,” many of them three- or four-person teams that can’t find room or funding for a dedicated data reporter.

It’s a development that could both strengthen the investigative work being done by these institutions and skill building around data analysis in journalism. Malan has experience in training journalists in skills of procuring, cleaning, and analyzing data, and she has high hopes for the kinds of stories and networked reporting that will be produced by this collaboration. We talked about IRE’s underutilized data library, potentially disruptive Supreme Court decisions around freedom of information, the unfortunate end for wildlife wandering onto airplane runways, and what it means to translate numbers into stories.

O’Donovan: How does someone end up majoring in physics and journalism?
Malan: My freshman year they started a program to do a bachelor of arts in physics. Physics Lite. And you could pair that with business or journalism or English — something that was really your major focus of study, but the B.A. in physics would give you a good science background. So you take physics, you take calculus, you take statistics, and that really gives you the good critical thinking and data background to pair with something else — in my case, journalism.
O’Donovan: I guess it’s kind of easy to see how that led into what you’re doing now. But did you always see them going hand in hand? Or is that something that came later?
Malan: In college, I thought I was going to be a science writer. That was the main reason I paired those. When I got into news and started going down the path of data journalism, I was very glad to have that background, for sure. But I started getting more into the data journalism world when the Caller-Times in Corpus Christi sent me to the IRE bootcamp, where it’s a weeklong, intensive week where you concentrate on learning Excel and Access and the different pitfalls you can face in data — some basic cleaning skils. That’s really what got me started in the data journalism realm. And then the newspaper continued to send me to training — to the CAR conferences every year and local community college classes to beef up my skills.
O’Donovan: So, how long were you at the Caller-Times?
Malan: I was there seven years. I started as a reporter in June 2006, and then moved up into editing in May of 2010.
O’Donovan: And in the time that you were there as their data person, what are some stories that you were particularly proud of, or made you feel like this was a a burgeoning field?
Malan: We focused on intensely local projects at the Caller-Times. One of the ones that I was really proud of I worked on with our city hall reporter Jessica Savage. She found out that the city streets are a huge issue in Corpus Christi. If you’ve ever driven here, you know they are just horrible — a disaster. And the city is trying to find a billion dollars to fix them.

So our city hall reporter found out that the city keeps a database of scores called the Pavement Condition Index. Basically, it’s the condition of your street. So we got that database and we merged it with a file of streets and color-coded it so people could fully see what the condition of their street was, and we put it a database for people to find their exact block. This was something the city did not want to give us at first, because if people know the condition of their street scores, they’re going to demand that we do something about it. We’re like, “Yeah, that’s kind of the idea.” But that database became the basis for an entire special section on our streets. We used it to find people on streets who scored a 0, and talked about how it effects their life — how often they have to repair their cars, how often they walk through giant puddles.

And then we paired it with a breakout box of every city council member and their score. We did a map online, which, for over a year, actually, has been a big hit while the city is discussing how they’re going to find this money. People have been using it as a basis for the debate that they’re having, which, to me, is really kind of how we make a difference. Using this data that the city had, bringing it to light, making it accessible, I think, has really just changed the debate here for people. So that’s one thing I’m really proud of — that we can give people information to make informed decisions.

O’Donovan: Part of your new position is going to be facilitating and assisting other journalists in starting to understand how to do this kind of work. How do you tell reporters that this isn’t scary — that it’s something they can do or they can learn? How do you begin that conversation?
Malan: [At the Caller-Times] we adopted the philosophy that data journalism isn’t just something that one nerdy person in the office does, but something that everyone in the newsroom should have in their toolbox. It really enhances eery beat at the newspaper.

I would do training sessions occasionally on Excel, Google Fusion Tables, Caspio to show everyone in the newsroom, “Here’s what’s possible.” Some people really pick up on it and take it and run with it. Some people are not as math oriented and are not going to be able to take it and run with it themselves, but at least they know those tools are available and what it’s possible to do with them.

So some of the reporters would be just aware of how we could analyze data and they would keep their eyes open for databases on their beats, and other reporters would run with it. That philosophy is very important in any newsroom today. A lot of what I’m going to be doing with IRE and INN is working with the INN members in helping them to gather the data and analyze it and inform their local reporting. So a lot of the same roles, but in a broader context.

O’Donovan: So a lot of it is understanding that everyone is going to come at it with a different skill level.
Malan: Yes, absolutely. All our members have different levels of skills. Some of our members have very highly skilled data teams, like ProPublica, Center for Public Integrity — they’re really at the forefront of data journalism. Other members are maybe one- or two-person newsrooms that may not have the training and don’t have any reporters with those skills. So the skill sets are all over the board. But it will be my job to help, especially smaller newsrooms, plug into those resources — especially the resources at IRE — the best they can, with the data library there and the training available there. We help them bring up their own skills and enhance their own reporting.
O’Donovan: When a reporter comes to you and says, “I just found this dataset or I just got access to it” — how do you dive into that information when it comes to looking for stories? How do you take all of that and start to look for what could turn into something interesting?
Malan: A lot of it depends on the data set. Just approach every set of data as a source that you’re interviewing. What is available there? What is maybe missing from the data is something you want to think about too? And you definitely want to narrow it down: A lot of data sets are huge, especially these federal data sets that might have records containing, I don’t know, 120 fields, but maybe you’re only interested in three of them. So you want to get to know the data set, and what is interesting in it, and you want to really narrow your focus.

One collaboration that INN did was using data gathered by NASA for the FAA, and it was essentially near misses — incidents at airports like hitting deer on the runway, and all these little things that can happen but aren’t necessarily reported. They all get compiled in this database, and pilots write these narratives about it, so that field is very interesting to them. There were four or five INN members who collaborated on that, and they all came away with different stories because they all found something else that was interesting for them locally.

O’Donovan: This position you’ll hold is about bringing the work of INN and IRE together. What’s that going to look like? We talk all the time about how journalism is moving in a more networked direction — where do you see this fitting into that?
Malan: IRE and INN have always had a very close relationship, and I think that this position just kind of formalizes that. I will be helping INN members plug into the resources of IRE, especially the data library, I’ll be working closely with Liz Lucas, the database director at IRE, and I’m actually going to be living near IRE so I can work more closely with them. Some of that data there is very underutilized and it’s really interesting and maybe hasn’t been used in any projects, especially on a national level.

So we can take that data and I can kind of help analyze it, help slice it for the various regions we might be looking at, and help the INN members use that data for their stories. I’ll basically be acting as almost a translator to get this data from the IRE and help the INN members use it.

Going the other way, with INN members, they might come up with some project idea where data isn’t available from the database library, or it might be something where we have to gather data from every state individually, so we might compile that and whatever we end up with will be sent back to the IRE library and made available to other IRE members. So it’s a two-way relationship.

O’Donovan: So in terms of managing this collaboration, what are the challenges? Are you think of building an interface for sharing data or documents?
Malan: We’re going to be setting up a kind of committee of data people with INN to have probably monthly calls and just discuss ideas, what they’re working on, brainstorming, possible ideas. I want it to be a very organic, ground-up process — I don’t want it to be dictating what the projects should be. I want the members to come up with their own ideas. So we’ll be brainstorming and coming up with things, and we’ll be managing the group through Basecamp and communicating that way. A lot of the other members are already on Basecamp and communicate that way through INN.

We’ll be communicating through this committee and coming up with ideas and I’l be working with other members to, to reach out to them. If we come up with an idea that deals with health care, for example, I might reach out to some of the members that are especially focused on health care and try to bring in other members on it.

O’Donovan: Do you foresee collaborations between members, like shared reporting and that kind of thing?
Malan: Yeah, depending on the project. Some of it might be shared reporting; some of it might be someone does a main interview. If we’re doing a crime story dealing with the FBI’s Uniformed Crime Report, maybe we just have one reporter from every property, we nominate one person to do the interview with the FBI that everyone can use in their own story, which they localize with their own data. So, yeah, depending on the project, we’ll have to kind of see how the reporting would shake out.
O’Donovan: Do you have any specific goals or types of stories you want to tell, or even just specific data sets you’re eager to get a look at?
Malan: I think there are several interesting sets in the IRE data library that we might go after at first. There’s really interesting health sets, for example, from the FDA — one of them is a database of adverse affects from drugs, complaints that people make that drugs have had adverse effects. So yeah, some of those can be right off the bat, ready to go and parse and analyze.

Some other data sets we might be looking at will be a little harder to get, will take some FOIs and some time to get. There are several major areas that our members focus on and that we’ll be looking at projects for. Environment, for example — fracking is a large issue, and how environment effects public health. Health care, especially with the Affordable Care Act coming into effect next year is going to be a large one. Politics, government, how money effects influences politicians is a huge area as we come up on the 2016 elections and the 2014 midterms. And education is another issue with achievement gaps, graduation rates, charter schools — those are all large issues that our members follow. Finding those commonalties and dealing with data sets, digging into that is going to be my first priority.

O’Donovan: The health question is interesting. Knight announced its next round of News Challenge grants is going to be all around health.
Malan: I’m excited about that. We have several members that are really specifically focused on healt,h so I feel like we might be able to get something good with that.
O’Donovan: Health care stuff or more public health stuff?
Malan: It’s a mix, but a lot of stuff is geared toward the Affordable Care Act now.
O’Donovan: Gathering these data sets must often involve a lot of coordination across states and jurisdictions.
Malan: Yeah, absolutely. One thing I am a little nervous about is the Supreme Court’s recent ruling in the Virginia case where they can now require you to live in a state to put in an FOI. That might complicate things a little bit. I know there are several groups working on lists of people who will put an FOI in for you in various states. But that can kind of just slow down the process and put a little kink in and add to the timeline. I’m concerned of course that now they know it’s been ruled constitutional that every state might make that the law. It could be a huge thing. A management nightmare.
O’Donovan: What kind of advice do you normally give to reporters who are struggling to get information that they know they should be allowed to have?
Malan: That’s something we encountered a lot here, especially getting data in the proper format, too. Laws on that can vary from state to state. A lot of governments will give you paper or PDF format, instead of the Excel or text file that you asked for. It’s always a struggle.

The advice is to know the law as best you can, know what exceptions are allowed under your state law, be able to quote — you don’t have to have the law memorized, but be able to quote specific sections that you know are on your side. Be prepared with your requests, and be prepared to fight for it. And in a lot of cases, it is a fight.

O’Donovan: That’s an interesting intersection of technical and legal skill. That’s a lot of education dollars right there.
Malan: Yeah, no kidding.
O’Donovan: When you do things like attend the NICAR conference and assess the scene more broadly, where do you see the most urgent gaps in the data journalism field? Is it that we need more data analysts? More computer scientists? More reporters with the fluency in communicating with government? More legal aid? If you could allocate more resources, where would you put them right now?
Malan: There’s always going to be a need for more very highly skilled data journalists who can gather these national sets, analyze them, clean them, get them into a digestible format, visualize them online, and inform readers. I would like to see more general beat reporters interested in data and at least getting skills in Excel and even Access — because the beat reporters are the ones on the ground, using their sources, finding these data sets or not finding them if they’re not aware of what data is. I would really like this to be a bigger push to at least educate most general beat reporters to a certain level.
O’Donovan: Where do you see the data journalism movement headed over the next couple years? What would your next big hope for the field be?
Malan: Well, of course I hope for it to go kind of mainstream, and that all reporters will have some sort of data skills. It’s of course harder with fewer and fewer resources, and reporters are learning how to tweet and Instagram, and there are demands on their time that have never been there.

But I would hope it would become just an normal part of journalism, that there would be no more “data journalism” — that it just becomes part of what we do, because it’s invaluable to reporting and to really helping ferret out the truth and to give context to stories.

June 27 2013

16:27

Sensor journalism, storytelling with Vine, fighting gender bias and more: Takeaways from the 2013 Civic Media Conference

mit-knight-civic-media-conference-2013Are there lessons journalists can learn from Airbnb? What can sensors tell us about the state of New York City’s public housing stock? How can nonprofits, governments, and for-profit companies collaborate to create places for public engagement online?

There were just a few of the questions asked at the annual Civic Media Conference hosted by MIT and the Knight Foundation in Cambridge this week. It covered a diverse mix of topics, ranging from government transparency and media innovation to disaster relief and technology’s influence on immigration issues. (For a helpful summary of the event’s broader themes check out VP of journalism and innovation Michael Maness‘s wrap-up talk.)

There was a decided bent towards pragmatism in the presentations, underscored by Knight president Alberto Ibargüen‘s measured, even questioning introduction to the News Challenge winners. “I ask myself what we have actually achieved,” he said of the previous cycles of the News Challenge. “And I ask myself how we can take this forward.”

While the big news was the announcement of this year’s winners and the fate of the program going forward, there were plenty of discussions and presentations that caught our attention.

Panelists and speakers — from Republican Congressman Darrell Issa and WNYC’s John Keefe to Columbia’s Emily Bell and recent MIT grads — offered insights on engagement (both online and off), data structure and visualization, communicating with government, the role of editors, and more. In the words of The Boston Globe’s Adrienne Debigare, “We may not be able to predict the future, but at least we can show up for the present.”

One more News Challenge

Though Ibargüen spoke about the future of the News Challenge in uncertain terms, Knight hasn’t put the competition on the shelf quite yet. Maness announced that there would indeed one more round of the challenge this fall with a focus on health. That’s about all the we know about the next challenge; Maness said Knight is still in the planning stages of the cycle and whatever will follow it. Maness said they want the challenge to address questions about tools, data, and technology around health care.

Opening up the newsroom

One of the more lively discussions at the conference focused on how news outlets can identify and harness the experience of outsiders. Jennifer Brandel, senior producer for WBEZ’s Curious City, said one way to “hack” newsrooms was to open them up to stories from freelance writers, but also to more input from the community itself. Brandel said journalists could also look beyond traditional news for inspiration for storytelling, mentioning projects like Zeega and the work of the National Film Board of Canada.

Laura Ramos, vice president of innovation and design for Gannett, said news companies can learn lessons on user design and meeting user needs from companies like Airbnb and Square. Ramos said another lesson to take from tech companies is discovering, and addressing, specific needs of users.

newsroominsidepanel

Bell, director of the Tow Center for Digital Journalism at Columbia University, said one solution for innovation at many companies has been creating research and development departments. But with R&D labs, the challenge is integrating the experiments of the labs, which are often removed from day-to-day activity, to the needs of the newsroom or other departments. Bell said many media companies need leadership that is open to experimentation and can juggle the immediate needs of the business with big-picture planning. Too often in newsrooms, or around the industry, people follow old processes or old ideas and are unable to change, something Bell compared to “watching six-year-olds playing soccer,” with everyone running to the ball rather than performing their role.

Former Knight-Mozilla fellow Dan Schultz said the issue of innovation comes down to how newsrooms allocate their attention and resources. Schultz, who was embedded at The Boston Globe during his fellowship, said newsrooms need to better allocate their developer and coding talent between day-to-day operations like dealing with the CMS and experimenting on tools that could be used in the future. Schultz said he supports the idea of R&D labs because “good technology needs planning,” but the needs of the newsroom don’t always meet with long-range needs on the tech side.

Ramos and Schultz both said one of the biggest threats to change in newsrooms can be those inflexible content management systems. Ramos said the sometimes rigid nature of a CMS can force people to make editorial decisions based on where stories should go, rather than what’s most important to the reader.

Vine, Drunk C-SPAN, and gender bias

!nstant: There was Nieman Foundation/Center for Civic Media crossover at this year’s conference: 2013 Nieman Fellows Borja Echevarría de la Gándara, Alex Garcia, Paula Molina, and Ludovic Blecher presented a proposal for a breaking news app called !nstant. The fellows created a wireframe of the app after taking Ethan Zuckerman’s News and Participatory Media class.

The app, which would combine elements of liveblogging and aggregation around breaking news events, was inspired by the coverage of the Boston marathon bombing and manhunt. The app would pull news and other information from a variety of sources, “the best from participatory media and traditional journalism,” Molina said. Rather than being a simple aggregator, !nstant would use a team of editors to curate information and add context to current stories when needed. “The legacy media we come from is not yet good at organizing the news in a social environment,” said Echevarría de la Gándara.

Drunk C-SPAN and Opened Captions: Schultz also presented a project — or really, an idea — that seems especially timely when more Americans than usual are glued to news coming out of the capitol. When Schultz was at the Globe, he realized it would be both valuable and simple to create an API that pulls closed captioning text from C-SPAN’s video files, a project he called Opened Captions, which we wrote about in December. “I wanted to create a service people could subscribe to whenever certain words were spoken on C-SPAN,” said Schultz. “But the whole point is [the browser] doesn’t know when to ask the questions. Luckily, there’s a good technology out there called WebSocket that most browsers support that allows the server and the browser to talk to each other.”

To draw attention to the possibilities of this technology, Schultz began experimenting with a project called Drunk C-SPAN, in which he aimed to track key terms used by candidates in a televised debate. The more the pols repeat themselves, the more bored the audience gets and the “drunker” the program makes the candidates sound.

But while Drunk C-SPAN was topical and funny, Schultz says the tool should be less about what people are watching and more about what they could be watching. (Especially since almost nobody in the gen pop is watching C-SPAN regularly.) Specifically, he envisions a system in which Opened Captions could send you data about what you’re missing on C-SPAN, translate transcripts live, or alert you when issues you’ve indicated an interest in are being discussed. For the nerds in the house, there could even be a badge system based on how much you’ve watched.

Schultz says Opened Captions is fully operational and available on GitHub, and he’s eager to hear any suggestions around scaling it and putting it to work.

followbiasFollow Bias is a Twitter plugin that calculates and visualizes the gender diversity of your Twitter followers. When you sign in to the app, it graphs how many of your followers are male, female, brands, or bots. Created by Nathan Mathias and Sarah Szalavitz of the MIT Media Lab, Follow Bias is built to counteract the pernicious function of social media that allows us to indulge our unconscious biases and pass them along to others, contributing to gender disparity in the media rather than counteracting it.

The app is still in private beta, but a demo, which gives a good summary of gender bias in the media, is online here. “The heroes we share are the heroes we have,” it reads. “Among lives celebrated by mainstream media and sites like Wikipedia, women are a small minority, limiting everyone’s belief in what’s possible.” The Follow Bias server updates every six hours, so the hope is that users will try to correct their biases by broadening the diversity of their Twitter feed. Eventually, Follow Bias will offer metrics, follower recommendations, and will allow users to compare themselves to their friends.

LazyTruth: Last fall, we wrote about Media Lab grad student Matt Stempeck’s LazyTruth, the Gmail extension that helps factcheck emails, particularly chain letters and phishing scams. After launching LazyTruth last fall, Stempeck told the audience at the Civic Media conference that the tool has around 7,000 users. He said the format of LazyTruth may have capped its growth: “We’ve realized the limits of Chrome extensions, and browser extensions in general, in that a lot of people who need this tool are never going to install browser extensions.”

Stempeck and his collaborators have created an email reply service to LazyTruth, that lets users send suspicious messages to ask@lazytruth.com to get an answer. Stempeck said they’ve also expanded their misinformation database with information from Snopes, Hoax-Slayer and Sophos, an antivirus and computer security company.

LazyTruth is now also open source, with the code available on GitHub. Stempeck said he hopes to find funding to expand the fact-checking into social media platforms.

Vine Toolkit: Recent MIT graduate Joanna Kao is working on a set of tools that would allow journalists or anyone else to use Vine in storytelling. The Vine Toolkit would provide several options to add context around the six-second video clips.

Kao said Vines offer several strengths and weaknesses for journalists: the short length, ease of use, and the built-in social distribution network around the videos. But the length is also problematic, she said, because it doesn’t provide context for readers. (Instagram’s moving in on this turf.) One part of the Vine Toolkit, Vineyard, would let users string together several vines that could be captioned and annotated, Kao said. Another tool, VineChatter, would allow a user to see conversations and other information being shared about specific Vine videos.

Open Space & Place: Of algorithms and sensor journalism

WNYC: We also heard from WNYC’s John Keefe during the Open Space & Place discussion. Keefe shared the work WNYC did around tracking Hurricane Sandy, and, of course, the Lab’s beloved Cicada Project. (Here’s our most recent check-in on that invasion topic.)

keefecicadas

As Keefe has told the Lab in the past, the next big step in data journalism will be figuring out what kind of stories can come out of asking questions of data. To demonstrate that idea, Keefe said WNYC is working on a new project measuring air quality in New York City by strapping sensors to bikers. This summer, they’ll be collaborating with the Mailman School of Public Health to do measurement runs across New York. Keefe said the goal would be to fill in gaps in government data supplied by particulate measurement stations in Brooklyn and the Bronx. WNYC is also interested in filling in data gaps around NYC’s housing authority, says Keefe. After Hurricane Sandy, some families living in public housing went weeks without power and longer without heat or hot water. Asked Keefe: “How can we use sensors or texting platforms to help these people inform us about what government is or isn’t doing in these buildings?”

With the next round of the Knight News Challenge focusing on health, keep on eye on these data-centric, sensor-driven, public health projects, because they’re likely to be going places.

Mapping the Globe: Another way to visualize the news, Mapping the Globe lets you see geographic patterns in coverage by mapping The Boston Globe’s stories. The project’s creator, Lab researcher Catherine D’Ignazio, used the geo-tagged locations already attached to more than 20,000 articles published since November 2011 to show how many of them relate to specific Boston neighborhoods — and by zooming out, how many stories relate to places across the state and worldwide. Since the map also displays population and income data, it’s one way to see what areas might be undercovered relative to who lives there — a geographical accountability system of sorts.

This post includes good screenshots of the prototype interactive map. The patterns raise lots of questions about why certain areas receive more attention than others: Is the disparity tied to race, poverty, unemployment, the location of Globe readers? But D’Ignazio also points out that there are few conclusive correlations or clear answers to her central question — “When does repeated newsworthiness in a particular place become a systemic bias?”

June 20 2013

18:24

A warning from Matt Waite about data journalism and race

“If you’re expecting talk-radio and television shout fests to talk about how awesome your statistical validity is, you’re an idiot.”

Matt Waite has an excellent post on Source today that tells the story of early data journalism in Florida during the 2000 presidential election. He delves into issues of race and identity, and explains how easily journalists with good sense can mix things up — and miss big stories — because of how quickly numbers can obfuscate reality.

Race and ethnicity are tricky topics with loads of nuance and definitional difficulties. But they aren’t the only places these issues come up. Anytime you’re comparing data across agencies and across geographies, be on high alert for mismatches. Crime is a huge issue—jurisdictions have different definitions of what constitutes a big theft versus a little one, for instance. Driving laws are another—what constitutes reckless driving changes state to state. Budgets are another nightmare—what dollar figure requires a bid or not changes from city to city.

Getting the metadata, getting someone one the phone and basic descriptive statistics will help you avoid traps and hopefully let you avoid getting your butt kicked like I did.

May 23 2013

16:37

Pew’s new data blog fills in the contextual gaps between information and stories

The Pew Research Center launched a new blog earlier this week that’s supposed to provide Pew-quality data and information at a real-time pace. It’s called Fact Tank, and it will be a home for what Pew calls it’s “unique brand of data journalism.”

Since Tuesday, they’ve written up data snapshots on topics like Secretary of State John Kerry’s approval rating, American support for drone usage, and media coverage of the Oklahoma tornado. Alan Murray left The Wall Street Journal to head the Pew Research Center in November.

May 15 2013

07:00

I am a coding denier

There is an exchange that sometimes takes place, perfectly described by Beth Ashton, between those who use technology, and those who don’t. It goes like this:

Prospective data journalist: ‘I’d really like to learn how to do data journalism but I can’t do statistics!’

Data journalist: ‘Don’t let that put you off, I don’t know anything about numbers either, I’m a journalist, not a mathematician!’

Prospective data journalist: ‘But I can’t code, and it all looks so codey and complicated’

Data journalist: That’s fine, NONE OF US can code. None of us. Open angle bracket back slash End close angle bracket.

“These people are coding deniers,” argues Beth.

I think she’s on to something. Flash back to a week before Beth published that post: I was talking to Caroline Beavon about the realisation of just how hard-baked ‘coding’ was into my workflow:

  • A basic understanding of RSS lies behind my ability to get regular updates from hundreds of sources
  • I look at repetitiveness in my work and seek to automate it where I can
  • I look at structure in information and use that to save time in accessing it

These are all logical responses to an environment with more information than a journalist can reasonably deal with, and I have developed many of them almost without realising.

They are responses as logical as deciding to use a pen to record information when human memory cannot store it reliably alone. Or deciding to learn shorthand when longhand writing cannot record reliably alone. Or deciding to use an audio recorder when that technology became available.

One of the things that makes us uniquely human is that we reach for technological supports – tools – to do our jobs better. The alphabet, of course, is a technology too.

But we do not argue that shorthand comes easy, or that audio recorders can be time consuming, or that learning to use a pen takes time.

So: ‘coding’ – whether you call it RSS, or automation, or pattern recognition – needs to be learned. It might seem invisible to those of us who’ve built our work patterns around it – just as the alphabet seems invisible once you’ve learned it. But, like the alphabet, it is a technology all the same.

But secondly – and more importantly – for this to happen as a profession we need to acknowledge that ‘coding’ is a skill that has become as central to working effectively in journalism as using shorthand, the pen, or the alphabet.

I don’t say ‘will be central’ but ‘has become‘. There is too much information, moving too fast, to continue to work with the old tools alone. From social networks to the quantified self; from RSS-enabled blogs to the open data movement; from facial recognition to verification, our old tools won’t do.

So I’m not going to be a coding denier. Coding is to digital information what shorthand was to spoken information. There, I’ve said it. Now, how can we do it better?

07:00

I am a coding denier

There is an exchange that sometimes takes place, perfectly described by Beth Ashton, between those who use technology, and those who don’t. It goes like this:

Prospective data journalist: ‘I’d really like to learn how to do data journalism but I can’t do statistics!’

Data journalist: ‘Don’t let that put you off, I don’t know anything about numbers either, I’m a journalist, not a mathematician!’

Prospective data journalist: ‘But I can’t code, and it all looks so codey and complicated’

Data journalist: That’s fine, NONE OF US can code. None of us. Open angle bracket back slash End close angle bracket.

“These people are coding deniers,” argues Beth.

I think she’s on to something. Flash back to a week before Beth published that post: I was talking to Caroline Beavon about the realisation of just how hard-baked ‘coding’ was into my workflow:

  • A basic understanding of RSS lies behind my ability to get regular updates from hundreds of sources
  • I look at repetitiveness in my work and seek to automate it where I can
  • I look at structure in information and use that to save time in accessing it

These are all logical responses to an environment with more information than a journalist can reasonably deal with, and I have developed many of them almost without realising.

They are responses as logical as deciding to use a pen to record information when human memory cannot store it reliably alone. Or deciding to learn shorthand when longhand writing cannot record reliably alone. Or deciding to use an audio recorder when that technology became available.

One of the things that makes us uniquely human is that we reach for technological supports – tools – to do our jobs better. The alphabet, of course, is a technology too.

But we do not argue that shorthand comes easy, or that audio recorders can be time consuming, or that learning to use a pen takes time.

So: ‘coding’ – whether you call it RSS, or automation, or pattern recognition – needs to be learned. It might seem invisible to those of us who’ve built our work patterns around it – just as the alphabet seems invisible once you’ve learned it. But, like the alphabet, it is a technology all the same.

But secondly – and more importantly – for this to happen as a profession we need to acknowledge that ‘coding’ is a skill that has become as central to working effectively in journalism as using shorthand, the pen, or the alphabet.

I don’t say ‘will be central’ but ‘has become‘. There is too much information, moving too fast, to continue to work with the old tools alone. From social networks to the quantified self; from RSS-enabled blogs to the open data movement; from facial recognition to verification, our old tools won’t do.

So I’m not going to be a coding denier. Coding is to digital information what shorthand was to spoken information. There, I’ve said it. Now, how can we do it better?

April 03 2013

22:05

Intercontinental collaboration: How 86 journalists in 46 countries can work on a single investigation

piggy-bank-offshore-banking-beach

On Thursday morning, the International Consortium of Investigative Journalists will begin releasing detailed reports on the workings of offshore tax havens. A little over a year ago, 260 gigabytes of data were leaked to ICIJ executive dIrector Gerard Ryle; they contained information about the finances of individuals in over 170 countries.

Ryle was a media executive in Australia at the time he received the data, says deputy director Marina Walker Guevara. “He came with the story under his arm.” Walker Guevara says the ICIJ was surprised Ryle wanted a job in their small office in Washington, but soon realized that it was only through their international scope and experience with cross border reporting that the Offshore Project could be executed. The result is a major international collaboration that has to be one of the largest in journalism history.

“It was a huge step. As reporters and journalists, the first thing you think is not ‘Let me see how I can share this with the world.’ You think: ‘How can I scoop everyone else?’ The thinking here was different.” Walker Guevara says the ICIJ seriously considered keeping the team to a core five or six members, but ultimately decided to go with the “most risky” approach when they realized the enormous scope of the project: Journalists from around the world were given lists of names to identify and, if they found interesting connections, were given access to Interdata, the secure, searchable, online database built by the ICIJ.

Just as the rise of information technology has allowed new competition for the attention of audiences, it’s also enabled traditional news organizations to partner in what can sometimes seem like dizzyingly complex relationships. The ICIJ says this is the largest collaborative journalism project they have ever organized, with the most comparable involving a team of 25 cross border journalists.

In the end, the Offshore Project brings together 86 journalists from 46 countries into an ongoing reporting collaboration. German and Canadian news outlets (Süddeutsche Zeitung, Norddeutscher Rundfunk, and the CBC) will be among the first to report their findings this week, with The Washington Post beginning their report on April 7, just in time for Tax Day. Reporters from more than 30 other publications also contributed, including Le Monde, the BBC and The Guardian. (The ICIJ actually published some preliminary findings in conjunction with the U.K. publications as a teaser back in November.)

“The natural step wasn’t to sit in Washington and try to figure out who is this person and why this matters in Azerbaijan or Romania,” Walker Guevara said, “but to go to our members there — or a good reporter if we didn’t have a member — give them the names, invite them into the project, see if the name mattered, and involve them in the process.”

Defining names that matter was a learning experience for the leaders of the Offshore Project. Writes Duncan Campbell, an ICIJ founder and current data journalism manager:

ICIJ’s fundamental lesson from the Offshore Project data has been patience and perseverance. Many members started by feeding in lists of names of politicians, tycoons, suspected or convicted fraudsters and the like, hoping that bank accounts and scam plots would just pop out. It was a frustrating road to follow. The data was not like that.

The data was, in fact, very messy and unstructured. Between a bevy of spreadsheets, emails, PDFs without OCR, and pictures of passports, the ICIJ still hasn’t finished mining all the data from the raw files. Campbell details the complicated process of cleaning the data and sorting it into a searchable database. Using NUIX software licenses granted to the ICIJ for free, it took a British programmer two weeks to build a secure database that would allow all of the far-flung journalists not only to safely search and download the documents, but also to communicate with one another through an online forum.

“Once we went to these places and gathered these reporters, we needed to give them the tools to function as a team,” Walker Guevara said.

Even so, some were so overwhelmed by the amount of information available, and so unaccustomed to hunting for stories in a database, that the ICIJ ultimately hired a research manager to do searches for reporters and send them the documents via email. “We do have places like Pakistan where the reporters didn’t have much Internet access, so it was a hassle for him,” says Walker Guevara, adding that there were also security concerns. “We asked him to take precautions and all that, and he was nervous, so I understand.”

They also had to explain to each of the reporting teams that they weren’t simply on the lookout for politicians hiding money and people who had broken the law. “First, you try the name of your president. Then, your biggest politician, former presidents — everybody has to go through that,” Walker Guevara says. While a few headline names did eventually appear — Imelda Marcos, Robert Mugabe — she says some of the most surprising stories came from observing broader trends.

“Alongside many usual suspects, there were hundreds of thousands of regular people — doctors and dentists from the U.S.,” she says, “It made us understand a system that is a lot more used than what you think. It’s not just people breaking the law or politicians hiding money, but a lot of people who may feel insecure in their own countries. Or hiding money from their spouses. We’re actually writing some stories about divorce.”

In the 2 million records they accessed, ICIJ reporters began to get an understanding of the methods account holders use to avoid association with these accounts. Many use “nominee directors,” a process which Campbell says is similar to registering a car in the name of a stranger. But in their post about the Offshore Project, the ICIJ team acknowledges that, to a great extent, most of the money being channeled through offshore accounts and shell companies is actually not being used for illegal transactions. Defenders of the offshore banks say they “allow companies and individuals to diversify their investments, forge commercial alliances across national borders, and do business in entrepreneur-friendly zones that eschew the heavy rules and red tape of the onshore world.”

Walker Guevara says that, while that can be true, the “parallel set of rules” that governs the offshore world so disproportionately favor the elite, wealthy few as to be unethical. “Regulations, bureaucracy, and red tape are bothersome,” she says, “but that’s how democracy works.”

Perhaps the most interesting question surrounding the Offshore Project, however, is how do you get traditional shoe-leather journalists up to speed on an international story that involves intensive data crunching. Walker Guevara says it’s all about recognizing when the numbers cease to be interesting on their own and putting them in global context. Ultimately, while it’s rewarding to be able to trace dozens of shell companies to a man accused of stealing $5 billion from a Russian bank, someone has to be able to connect the dots.

“This is not a data story. It was based on a huge amount of data, but once you have the name and you look at your documents, you can’t just sit there and write a story,” says Walker Guevara. “That’s why we needed reporters on the ground. We needed people checking courthouse records. We needed people going and talking to experts in the field.”

All of the stories that result from the Offshore Project — some of which could take up to a year to be published — will live on a central project page at ICIJ.org. The team is also considering creating a web app that will allow users to explore some (though probably not all) of the data. In terms of the unique tools they built, Walker Guevara says most are easily replicable by anyone using NUIX or dtSearch software, but they won’t be open sourced. Other lessons from the project, like the inherent vulnerability of PGP encryption and “other complex cryptographic systems popular with computer hackers,” will endure.

“I think one of the most fascinating things about the project was that you couldn’t isolate yourself. It was a big temptation — the data was very addictive,” Walker Guevara says. “But the story worked because there was a whole other level of traditional reporting that was going and checking public records, going and seeing — going places.”

Photo by Aaron Shumaker used under a Creative Commons license.

11:54

August 29 2012

09:29
08:38

How to teach a journalist programming

Cross-posted from Data Driven Journalism.

Earlier this year I set out to tackle a problem that was bothering me: journalists who had started to learn programming were giving up.

They were hitting a wall. In trying to learn the more advanced programming techniques – particularly those involved in scraping – they seemed to fall into one of two camps:

  • People who learned programming, but were taking far too long to apply it, and so losing momentum – the generalists
  • People who learned how to write one scraper, but could not extend it to others, and so becoming frustrated – the specialists

In setting out to figure out what was going wrong, I set myself a task which I have found helpful in taking a fresh perspective on an issue: I started writing a book chapter.

The nice thing about writing books is that they force you to put together a coherent and complete narrative about an entire process. You identify gaps that you weren’t otherwise aware of, and you have to put yourself in the place of someone with no knowledge at all. You take nothing for granted.

So my starting point was this: what is a good way to learn how to write scrapers?

That’s a different question to ‘How do I write a scraper?’ and also to ‘How do I learn programming?’ And that’s important. Because most of the resources available fell into one of those two camps.

The people trying to learn programming were hitting a common problem in learning: lack of feedback. They might be able to change a variable in Ruby, but how would that help in journalism? It was like learning the structure of the entire French language just so they could go to the corner shop and ask for a loaf of bread.

The people learning how to write one scraper were hitting another common problem: learning how to do one task well, rather than the underlying principles. This was like someone learning how to ask for a loaf of bread in French, but not being able to extend that knowledge into asking for directions home.

I tackled both by beginning the chapter with probably the simplest scraper you can write: a spreadsheet formula in Google Docs. This provided the instant feedback that the generalists lacked, but the formula was also used to introduce some key concepts in programming: functions, strings, indexes, and parameters. These would provide key principles that the specialists lacked, and which future chapters could build on.

Learning differently

I also looked at how journalists tried to learn programming, and how programmers developed, and realised something else: journalists and programmers learned differently.

I’m generalising wildly, of course, but journalists – particularly student journalists – often try to learn programming from books. That may sound like common sense, but it’s not in an art or a science – and programming is both.

Programmers – if I’m to generalise wildly again – typically combine books (which they don’t read cover to cover) with documentation, adapting other code, trial and error, and each other. When they teach journalists, they often don’t realise that journalists don’t always share that culture.

And journalists – coming traditionally from a background in the humanities – are used to learning from books: static knowledge. Teaching programming to journalists then, I realised, would also mean teaching how programmers learn.

So my chapter introducing that first scraper introduced some other key concepts as well. It would direct readers to the documentation on the function being used, and invite them to engage in some trial and error to work out a solution to a problem. As more scraper tutorials were added, they introduced more key concepts in programming – importantly, without having to learn an entirely new language, and with documentation and trial and error running throughout, along with the principle of adapting other code.

I tested the approach at the News:Rewired conference. Can you teach scraping in 20 minutes? At a basic level, yes: it seemed you could.

Agile publishing

After 20,000 words I realised that my book chapter was turning into a book. Meanwhile, a colleague had told me about Leanpub: a website that allowed people to publish books as they were being written, with readers able to download new updates as they came.

The platform suited the book perfectly: it meant I could stagger the publication of the book, Codecademy-style, with readers trying at least one scraper per week, but also having the time to experiment with trial and error before the next chapter was published. It meant that I could respond to feedback on the earlier chapters and adapt the rest of the book before it was published (in one case a Brazilian reader pointed out after the first chapter was published that the Portuguese-language Google Docs uses semi colons instead of commas). If examples used in the book changed then I could replace them. And it meant that if new tools or techniques emerged, I could incorporate them.

It is a programming-style approach to publishing – trial and error – which very much suits the spirit of the book. It’s extra work, but it makes for a much better writing experience. I hope the readers think so too.

Scraping for Journalists is available at Leanpub.com/ScrapingForJournalists

08:38

How to teach a journalist programming

Cross-posted from Data Driven Journalism.

Earlier this year I set out to tackle a problem that was bothering me: journalists who had started to learn programming were giving up.

They were hitting a wall. In trying to learn the more advanced programming techniques – particularly those involved in scraping – they seemed to fall into one of two camps:

  • People who learned programming, but were taking far too long to apply it, and so losing momentum – the generalists
  • People who learned how to write one scraper, but could not extend it to others, and so becoming frustrated – the specialists

In setting out to figure out what was going wrong, I set myself a task which I have found helpful in taking a fresh perspective on an issue: I started writing a book chapter.

The nice thing about writing books is that they force you to put together a coherent and complete narrative about an entire process. You identify gaps that you weren’t otherwise aware of, and you have to put yourself in the place of someone with no knowledge at all. You take nothing for granted.

So my starting point was this: what is a good way to learn how to write scrapers?

That’s a different question to ‘How do I write a scraper?’ and also to ‘How do I learn programming?’ And that’s important. Because most of the resources available fell into one of those two camps.

The people trying to learn programming were hitting a common problem in learning: lack of feedback. They might be able to change a variable in Ruby, but how would that help in journalism? It was like learning the structure of the entire French language just so they could go to the corner shop and ask for a loaf of bread.

The people learning how to write one scraper were hitting another common problem: learning how to do one task well, rather than the underlying principles. This was like someone learning how to ask for a loaf of bread in French, but not being able to extend that knowledge into asking for directions home.

I tackled both by beginning the chapter with probably the simplest scraper you can write: a spreadsheet formula in Google Docs. This provided the instant feedback that the generalists lacked, but the formula was also used to introduce some key concepts in programming: functions, strings, indexes, and parameters. These would provide key principles that the specialists lacked, and which future chapters could build on.

Learning differently

I also looked at how journalists tried to learn programming, and how programmers developed, and realised something else: journalists and programmers learned differently.

I’m generalising wildly, of course, but journalists – particularly student journalists – often try to learn programming from books. That may sound like common sense, but it’s not in an art or a science – and programming is both.

Programmers – if I’m to generalise wildly again – typically combine books (which they don’t read cover to cover) with documentation, adapting other code, trial and error, and each other. When they teach journalists, they often don’t realise that journalists don’t always share that culture.

And journalists – coming traditionally from a background in the humanities – are used to learning from books: static knowledge. Teaching programming to journalists then, I realised, would also mean teaching how programmers learn.

So my chapter introducing that first scraper introduced some other key concepts as well. It would direct readers to the documentation on the function being used, and invite them to engage in some trial and error to work out a solution to a problem. As more scraper tutorials were added, they introduced more key concepts in programming – importantly, without having to learn an entirely new language, and with documentation and trial and error running throughout, along with the principle of adapting other code.

I tested the approach at the News:Rewired conference. Can you teach scraping in 20 minutes? At a basic level, yes: it seemed you could.

Agile publishing

After 20,000 words I realised that my book chapter was turning into a book. Meanwhile, a colleague had told me about Leanpub: a website that allowed people to publish books as they were being written, with readers able to download new updates as they came.

The platform suited the book perfectly: it meant I could stagger the publication of the book, Codecademy-style, with readers trying at least one scraper per week, but also having the time to experiment with trial and error before the next chapter was published. It meant that I could respond to feedback on the earlier chapters and adapt the rest of the book before it was published (in one case a Brazilian reader pointed out after the first chapter was published that the Portuguese-language Google Docs uses semi colons instead of commas). If examples used in the book changed then I could replace them. And it meant that if new tools or techniques emerged, I could incorporate them.

It is a programming-style approach to publishing – trial and error – which very much suits the spirit of the book. It’s extra work, but it makes for a much better writing experience. I hope the readers think so too.

Scraping for Journalists is available at Leanpub.com/ScrapingForJournalists

04:53

'Thunderdome' takes shape at Digital First

NetNewsCheck :: Digital First Media Editor in Chief Jim Brady talks about the fundamental transition underway in Journal Register Co.'s news gathering operation. News aggregation and centralized production of big national stories as well as health, travel and education feature wells are adding heft to papers' websites while letting the papers focus on what they do best: local news. There is also a centralized data journalism team and a SWAT team of producers who can double down on big breaking stories.

A report by Michael Depp, www.netnewscheck.com

August 17 2012

09:55

Has the increase in data changed your newsroom?

I’m currently researching if newsrooms have been changed by the increase in availability of data – from FOI and data.gov sites to open data and APIs. Specifically I’m interesting in the watchdog role of journalism, but any other uses are relevant too.

If you work in this area I’d really appreciate it if you can complete the survey below – and share it with others you know can contribute. Here it is:

09:55

Has the increase in data changed your newsroom?

I’m currently researching if newsrooms have been changed by the increase in availability of data – from FOI and data.gov sites to open data and APIs.

If you work in this area I’d really appreciate it if you can complete the survey below – and share it with others you know can contribute. Here it is:

August 10 2012

15:43

Two linked data journalism workshops for hacks and hackers

Following on from the sold-out Data Journalism Camp in 2011, DEN has combined forces with the MADE project to offer two linked workshops this autumn.

Download the flyer here
  • DJCAMP2012 with Paul Bradshaw and Megan Knight is aimed journalists who want to turn data into compelling stories and runs from 9:30am on Friday, September 21 to 5pm on Saturday, September 22 in the Media Factory  on the University of Central Lancashire's Preston Campus.
  • If you want to learn how to build your own data scraper and have at least a basic understanding of software development languages Ruby and / or Python, then there's a four-hour Scraping Masterclass with ScraperWiki founder Julian Todd from 9:30am to 1:30pm on Saturday, September 22, also in the Media Factory. 

Both DJCAMP2012 and the Scraping Masterclass are being co-sponsored by the MADE Project and the School of Journalism, Media and Communication at UCLan,

More information and registration details are available HERE

August 07 2012

16:17
11:51

A case study in online journalism part 3: ebooks (investigating the Olympic torch relay)

8000 Holes - book cover

In part one I outlined some of the data journalism processes involved in the Olympic torch relay investigation, in part 2 I explained how verification, SEO and ‘passive aggressive newsgathering’ played a role. This final part looks at how ebooks offered a new opportunity to tell the story in depth – and publish while the story was still topical.

Ebooks – publishing before the event has even finished

After a number of stories from a variety of angles I reached a fork in the road. It felt like we had been looking at this story from every angle. More than one editor, when presented with an update, said that they’d already ‘done the torch story’. I would have done the same.

But I thought of a quote on persistence from Ian Hislop that I’d published on the Help Me Investigate blog previously. “It is saying the same true thing again and again and again and again until the penny drops.”

Although it sometimes felt like we might be boring people with our insistence on continuing to dig we needed, I felt, to say the same thing again. Not the story of ‘Executive carries the torch’ but how that executive and so many others came to carry it, why that mattered, and what the impact was. A longform report.

Traditionally there would have been so space for this story. It would be too long for a newspaper or magazine, far too short for a book – where the production timescale would have missed any topicality anyway.

But we didn’t have to worry about that – because we had e-publishing.

It still seems incredible to me that we could write up and publish a book on the missed promises of the Olympic torch relay before the relay had even finished. Indeed: to also publish the day before the book’s main case study was likely to run.

But if we wanted to do that, we had about a week to hit that deadline, with important holes in our narrative, and working largely in our spare time.

First, we needed a case study to represent the human impact of the corporate torchbearers and open our book. Quite a few had been mentioned in local newspapers when they discovered that less-than-inspirational individuals had taken their place, but HMI contributor Carol Miers found one who couldn’t have been more deserving: Jack Binstead had received the maximum number of nominations; he was just 15 (half of torchbearer places were supposed to go to young people – they didn’t); and he was tipped to go to the next Paralympics.

We also needed to find out if there was an impact on the genuinely inspirational people who did get to carry the torch – I had been chasing a couple when Geoff Holt came through the site’s comments (see above). That was our ending.

For the middle we needed to pin down some of the numbers around the relay. Comments from earlier stories had indicated that some people didn’t see why it was important that executives were carrying the torches – unaware, perhaps, that promises had been made about where places would go, and what sort of stories torchbearers should have.

In particular, the organisers had promised that 90% of places would be available to the general public and that 50% of places would go to young people aged 12-24. I had to nail down where each chunk of tickets had gone - and at how many points they had been taken away from availability to the ‘general public’. Ultimately, the middle of the book would describe how that 90% got chipped away until it was more like 75%.

That middle would then be fleshed out with the themes around what happened to the other 25%: essentially some of the stories we’d already told, plus some others that filled out the picture.

Writing in this way allowed us to go beyond the normal way of writing – shock at a revelation – to identifying where things went wrong and how. For all the anger at corporate sponsors for their allocation of torch relay places, it was ultimately LOCOG’s responsibility to approve nominations, to publish 8,000 “inspirational” nomination stories, and to meet the promises that they had made about how they would be allocated. The buck stopped there.

Thanks to the iterative way we had worked so far – publishing each story as it came, asking questions in public, building an online ‘footprint’ that others could find, establishing collaborative relationships and bookmarking to create an archive – we met our deadline.

It was a timescale which allowed us to tap into interest in the relay while it was still topical, and while executive torchbearers were still carrying the torch.

8,000 Holes: How the 2012 Olympic Torch Relay Lost its Way was published on day 66 of the 70-day Olympic torch relay. All proceeds went to the Brittle Bone Society, of which Jack is an ambassador. The publishers – Leanpub – agreed to give their commission on the book to the charity as well. This was all organised over email in 24 hours a couple days before the book went live.

We organised an interview with Jack Binstead which was published in The Guardian the day after – the day that the torch was to go through his home town and the day that he would be flying out of the country to avoid it. An interview with Journalism.co.uk on the ebook itself – Help Me Investigate’s first – was published the same day.

We published data on where torchbearer places went in The Guardian’s datablog two days after that, and serialised the book throughout the week, along with some additional pieces – for example, on how LOCOG missed their target of 50% of places going to young people by other 1,000 places. A lengthier interview with Jack and his mother was published at the end of the week.

In theory this should have captured interest in the torch relay at just the right time – but I think we misjudged two factors.

The first was beyond our control: the weather changed.

Until now, the weather had been awful. When it changed, the mood of the country changed, and there was less interest in the missed promises of the Olympic torch relay. But it also coincided with another change: the final week of the torch relay was also the last few days before the opening ceremony – and as the weather changed, attention shifted to the Olympic Games itself.

The torch relay, which had been squeezed dry of every possible angle for nine weeks, was – finally – yesterday’s news. It was no longer about who was carrying the torch, but about where that torch was going, and who might carry the last one.

Still, the book raised money for a deserving charity, and its story is not over. When the next torch relay comes around, I wonder, will it benefit from a resurgence of interest?

Get the free ebook for the full story: 8,000 Holes: How the 2012 Olympic Torch Relay Lost its Way - Leanpub.com/8000holes

 

11:51

A case study in online journalism part 3: ebooks (investigating the Olympic torch relay)

8000 Holes - book cover

In part one I outlined some of the data journalism processes involved in the Olympic torch relay investigation, in part 2 I explained how verification, SEO and ‘passive aggressive newsgathering’ played a role. This final part looks at how ebooks offered a new opportunity to tell the story in depth – and publish while the story was still topical.

Ebooks – publishing before the event has even finished

After a number of stories from a variety of angles I reached a fork in the road. It felt like we had been looking at this story from every angle. More than one editor, when presented with an update, said that they’d already ‘done the torch story’. I would have done the same.

But I thought of a quote on persistence from Ian Hislop that I’d published on the Help Me Investigate blog previously. “It is saying the same true thing again and again and again and again until the penny drops.”

Although it sometimes felt like we might be boring people with our insistence on continuing to dig we needed, I felt, to say the same thing again. Not the story of ‘Executive carries the torch’ but how that executive and so many others came to carry it, why that mattered, and what the impact was. A longform report.

Traditionally there would have been so space for this story. It would be too long for a newspaper or magazine, far too short for a book – where the production timescale would have missed any topicality anyway.

But we didn’t have to worry about that – because we had e-publishing.

It still seems incredible to me that we could write up and publish a book on the missed promises of the Olympic torch relay before the relay had even finished. Indeed: to also publish the day before the book’s main case study was likely to run.

But if we wanted to do that, we had about a week to hit that deadline, with important holes in our narrative, and working largely in our spare time.

First, we needed a case study to represent the human impact of the corporate torchbearers and open our book. Quite a few had been mentioned in local newspapers when they discovered that less-than-inspirational individuals had taken their place, but HMI contributor Carol Miers found one who couldn’t have been more deserving: Jack Binstead had received the maximum number of nominations; he was just 15 (half of torchbearer places were supposed to go to young people – they didn’t); and he was tipped to go to the next Paralympics.

We also needed to find out if there was an impact on the genuinely inspirational people who did get to carry the torch – I had been chasing a couple when Geoff Holt came through the site’s comments (see above). That was our ending.

For the middle we needed to pin down some of the numbers around the relay. Comments from earlier stories had indicated that some people didn’t see why it was important that executives were carrying the torches – unaware, perhaps, that promises had been made about where places would go, and what sort of stories torchbearers should have.

In particular, the organisers had promised that 90% of places would be available to the general public and that 50% of places would go to young people aged 12-24. I had to nail down where each chunk of tickets had gone - and at how many points they had been taken away from availability to the ‘general public’. Ultimately, the middle of the book would describe how that 90% got chipped away until it was more like 75%.

That middle would then be fleshed out with the themes around what happened to the other 25%: essentially some of the stories we’d already told, plus some others that filled out the picture.

Writing in this way allowed us to go beyond the normal way of writing – shock at a revelation – to identifying where things went wrong and how. For all the anger at corporate sponsors for their allocation of torch relay places, it was ultimately LOCOG’s responsibility to approve nominations, to publish 8,000 “inspirational” nomination stories, and to meet the promises that they had made about how they would be allocated. The buck stopped there.

Thanks to the iterative way we had worked so far – publishing each story as it came, asking questions in public, building an online ‘footprint’ that others could find, establishing collaborative relationships and bookmarking to create an archive – we met our deadline.

It was a timescale which allowed us to tap into interest in the relay while it was still topical, and while executive torchbearers were still carrying the torch.

8,000 Holes: How the 2012 Olympic Torch Relay Lost its Way was published on day 66 of the 70-day Olympic torch relay. All proceeds went to the Brittle Bone Society, of which Jack is an ambassador. The publishers – Leanpub – agreed to give their commission on the book to the charity as well. This was all organised over email in 24 hours a couple days before the book went live.

We organised an interview with Jack Binstead which was published in The Guardian the day after – the day that the torch was to go through his home town and the day that he would be flying out of the country to avoid it. An interview with Journalism.co.uk on the ebook itself – Help Me Investigate’s first – was published the same day.

We published data on where torchbearer places went in The Guardian’s datablog two days after that, and serialised the book throughout the week, along with some additional pieces – for example, on how LOCOG missed their target of 50% of places going to young people by other 1,000 places. A lengthier interview with Jack and his mother was published at the end of the week.

In theory this should have captured interest in the torch relay at just the right time – but I think we misjudged two factors.

The first was beyond our control: the weather changed.

Until now, the weather had been awful. When it changed, the mood of the country changed, and there was less interest in the missed promises of the Olympic torch relay. But it also coincided with another change: the final week of the torch relay was also the last few days before the opening ceremony – and as the weather changed, attention shifted to the Olympic Games itself.

The torch relay, which had been squeezed dry of every possible angle for nine weeks, was – finally – yesterday’s news. It was no longer about who was carrying the torch, but about where that torch was going, and who might carry the last one.

Still, the book raised money for a deserving charity, and its story is not over. When the next torch relay comes around, I wonder, will it benefit from a resurgence of interest?

Get the free ebook for the full story: 8,000 Holes: How the 2012 Olympic Torch Relay Lost its Way - Leanpub.com/8000holes

 

August 06 2012

07:38

A case study in online journalism part 2: verification, SEO and collaboration (investigating the Olympic torch relay)

corporate Olympic torchbearers image

Having outlined some of the data journalism processes involved in the Olympic torch relay investigation, in part 2 I want to touch on how verification and ‘passive aggressive newsgathering’ played a role.

Verification: who’s who

Data in this story not only provided leads which needed verifying, but also helped verify leads from outside the data.

In one example, an anonymous tip-off suggested that both children of one particular executive were carrying the Olympic torch on different legs of the relay. A quick check against his name in the data suggested this was so: two girls with the same unusual surname were indeed carrying the torch. Neither mentioned the company or their father. But how could we confirm it?

The answer involved checking planning applications, Google Streetview, and a number of other sources, including newsletters from the private school that they both attended which identified the father.

In another example, I noticed that one torchbearer had mentioned running alongside two employees of Aggreko, who were paying for their torches. I searched for other employees, and found a cake shop which had created a celebratory cake for three of them. Having seen how some corporate sponsors used their places, I went on a hunch and looked up the board of directors, searching in the data first for the CEO Rupert Soames. His name turned up – with no nomination story. A search for other directors found that more than half the executive board were carrying torches – which turned out to be our story. The final step: a call to the company to get a reaction and confirmation.

The more that we knew about how torch relay places had been used, the easier it was to verify other torchbearers. As a pattern emerged of many coming from the telecomms industry, that helped focus the search – but we had to be aware that having suspicions ‘confirmed’ didn’t mean that the name itself was confirmed – it was simply that you were more likely to hit a match that you could verify.

Scepticism was important: at various times names seemed to match with individuals but you had to ask ‘Would that person not use his title? Why would he be nominated? Would he be that age now?’

Images helped – sometimes people used the same image that had been used elsewhere (you could match this with Google Images ‘match image’ feature, then refine the search). At other times you could match with public photos of the person as they carried the torch.

This post on identifying mystery torchbearers gives more detail.

Passive aggressive newsgathering

Alerts proved key to the investigation. Early on I signed up for daily alerts on any mention of the Olympic torch. 95% of stories were formulaic ‘local town/school/hero excited about torch’ reports, but occasionally key details would emerge in other pieces – particularly those from news organisations overseas.

Google Alerts for Olympic torch

It was from these that I learned how many places exactly Dow, Omega, Visa and others had, and how many were nominated. It was how I learned about torchbearers who were not even listed on the official site, about the ‘criteria’ that were supposed to be adhered to by some organisations, about public announcements of places which suggested a change from previous numbers, and more besides.

As I came across anything that looked interesting, I bookmarked and tagged it. Some would come in useful immediately, but most would only come in useful later when I came to write up the full story. Essentially, they were pieces of a jigsaw I was yet to put together.  (For example, this report mentioned that 2,500 employees were nominated within Dow for just 10 places. How must those employees feel when they find the company’s VP of Olympic operations took up one of the few places? Likewise, he fit a broader pattern of sponsorship managers carrying the torch)

I also subscribed to any mention of the torch relay in Parliament, and any mention in FOI requests.

SEO – making yourself findable

One of the things I always emphasise to my students is the importance of publishing early and often on a subject to maximise the opportunities for others in the field to find out – and get in touch. This story was no exception to this. From the earliest stages through to the last week of the relay, users stumbled across the site as they looked for information on the relay – and passed on their concerns and leads.

It was particularly important with a big public event like the Olympic torch relay, which generated a lot of interest among local people. In the first week of the investigation one photographer stumbled across the site because he was searching for the name of one of the torchbearers we had identified as coming from adidas. He passed on his photographs – but more importantly, made me aware that there may be photographs of other executives who had already carried the torch.

That led to the strongest image of the investigation – two executives exchanging a ‘torch kiss’ (shown at the top of this post) – which was in turn picked up by The Daily Mail.

Other leads kept coming. The tip-off about the executive’s daughters mentioned above; someone mentioning two more Aggreko directors – one of which had never been published on the official site, and the other had been listed and then removed. Questions about a Polish torchbearer who was not listed on the official site or, indeed, anywhere on the web other than the BBC’s torch relay liveblog. Challenges to one story we linkblogged, which led to further background that helped flesh out the processes behind the nominations given to universities.

When we published the ‘mystery torchbearers’ with The Guardian some got in touch to tell us who they were. In one case, that contact led to an interview which closed the book: Geoff Holt, the first quadriplegic to sail single-handed across the Atlantic Ocean.

Collaboration

I could have done this story the old-fashioned way: kept it to myself, done all the digging alone, and published one big story at the end.

It wouldn’t have been half as good. It wouldn’t have had the impact, it wouldn’t have had the range, and it would have missed key ingredients.

Collaboration was at the heart of this process. As soon as I started to unearth the adidas torchbearers I got in touch with The Guardian’s James Ball. His report the week after added reactions from some of the companies involved, and other torchbearers we’d simultaneously spotted. But James also noticed that one of Coca Cola’s torchbearers was a woman “who among other roles sits on a committee of the US’s Food and Drug Administration”.

It was collaborating with contacts in Staffordshire which helped point me to the ‘torch kiss’ image. They in turn followed up the story behind it (a credit for Help Me Investigate was taken out of the piece – it seems old habits die hard), and The Daily Mail followed up on that to get some further reaction and response (and no, they didn’t credit the Stoke Sentinel either). In Bournemouth and Sussex local journalists took up the baton (sorry), and the Times Higher did their angle.

We passed on leads to Ventnor Blog, whose users helped dig into a curious torchbearer running through the area. And we published a list of torchbearers missing stories in The Guardian, where users helped identify them.

Collaborating with an international mailing list for investigative journalists, I generated datasets of local torchbearers in Hungary, Italy, India, the Middle East, Germany, and Romania. German daily newspaper Der Tagesspiegel got in touch and helped trace some of the Germans.

And of course, within the Help Me Investigate network people were identifying mystery torchbearers, getting responses from sponsors, visualising data, and chasing interviews. One contributor in particular – Carol Miers – came on board halfway through and contributed some of the key elements of the final longform report – in particular the interview that opens the book, which I’ll talk about in the final part tomorrow.

07:38

A case study in online journalism part 2: verification, SEO and collaboration (investigating the Olympic torch relay)

corporate Olympic torchbearers image

Having outlined some of the data journalism processes involved in the Olympic torch relay investigation, in part 2 I want to touch on how verification and ‘passive aggressive newsgathering’ played a role.

Verification: who’s who

Data in this story not only provided leads which needed verifying, but also helped verify leads from outside the data.

In one example, an anonymous tip-off suggested that both children of one particular executive were carrying the Olympic torch on different legs of the relay. A quick check against his name in the data suggested this was so: two girls with the same unusual surname were indeed carrying the torch. Neither mentioned the company or their father. But how could we confirm it?

The answer involved checking planning applications, Google Streetview, and a number of other sources, including newsletters from the private school that they both attended which identified the father.

In another example, I noticed that one torchbearer had mentioned running alongside two employees of Aggreko, who were paying for their torches. I searched for other employees, and found a cake shop which had created a celebratory cake for three of them. Having seen how some corporate sponsors used their places, I went on a hunch and looked up the board of directors, searching in the data first for the CEO Rupert Soames. His name turned up – with no nomination story. A search for other directors found that more than half the executive board were carrying torches – which turned out to be our story. The final step: a call to the company to get a reaction and confirmation.

The more that we knew about how torch relay places had been used, the easier it was to verify other torchbearers. As a pattern emerged of many coming from the telecomms industry, that helped focus the search – but we had to be aware that having suspicions ‘confirmed’ didn’t mean that the name itself was confirmed – it was simply that you were more likely to hit a match that you could verify.

Scepticism was important: at various times names seemed to match with individuals but you had to ask ‘Would that person not use his title? Why would he be nominated? Would he be that age now?’

Images helped – sometimes people used the same image that had been used elsewhere (you could match this with Google Images ‘match image’ feature, then refine the search). At other times you could match with public photos of the person as they carried the torch.

This post on identifying mystery torchbearers gives more detail.

Passive aggressive newsgathering

Alerts proved key to the investigation. Early on I signed up for daily alerts on any mention of the Olympic torch. 95% of stories were formulaic ‘local town/school/hero excited about torch’ reports, but occasionally key details would emerge in other pieces – particularly those from news organisations overseas.

Google Alerts for Olympic torch

It was from these that I learned how many places exactly Dow, Omega, Visa and others had, and how many were nominated. It was how I learned about torchbearers who were not even listed on the official site, about the ‘criteria’ that were supposed to be adhered to by some organisations, about public announcements of places which suggested a change from previous numbers, and more besides.

As I came across anything that looked interesting, I bookmarked and tagged it. Some would come in useful immediately, but most would only come in useful later when I came to write up the full story. Essentially, they were pieces of a jigsaw I was yet to put together.  (For example, this report mentioned that 2,500 employees were nominated within Dow for just 10 places. How must those employees feel when they find the company’s VP of Olympic operations took up one of the few places? Likewise, he fit a broader pattern of sponsorship managers carrying the torch)

I also subscribed to any mention of the torch relay in Parliament, and any mention in FOI requests.

SEO – making yourself findable

One of the things I always emphasise to my students is the importance of publishing early and often on a subject to maximise the opportunities for others in the field to find out – and get in touch. This story was no exception to this. From the earliest stages through to the last week of the relay, users stumbled across the site as they looked for information on the relay – and passed on their concerns and leads.

It was particularly important with a big public event like the Olympic torch relay, which generated a lot of interest among local people. In the first week of the investigation one photographer stumbled across the site because he was searching for the name of one of the torchbearers we had identified as coming from adidas. He passed on his photographs – but more importantly, made me aware that there may be photographs of other executives who had already carried the torch.

That led to the strongest image of the investigation – two executives exchanging a ‘torch kiss’ (shown at the top of this post) – which was in turn picked up by The Daily Mail.

Other leads kept coming. The tip-off about the executive’s daughters mentioned above; someone mentioning two more Aggreko directors – one of which had never been published on the official site, and the other had been listed and then removed. Questions about a Polish torchbearer who was not listed on the official site or, indeed, anywhere on the web other than the BBC’s torch relay liveblog. Challenges to one story we linkblogged, which led to further background that helped flesh out the processes behind the nominations given to universities.

When we published the ‘mystery torchbearers’ with The Guardian some got in touch to tell us who they were. In one case, that contact led to an interview which closed the book: Geoff Holt, the first quadriplegic to sail single-handed across the Atlantic Ocean.

Collaboration

I could have done this story the old-fashioned way: kept it to myself, done all the digging alone, and published one big story at the end.

It wouldn’t have been half as good. It wouldn’t have had the impact, it wouldn’t have had the range, and it would have missed key ingredients.

Collaboration was at the heart of this process. As soon as I started to unearth the adidas torchbearers I got in touch with The Guardian’s James Ball. His report the week after added reactions from some of the companies involved, and other torchbearers we’d simultaneously spotted. But James also noticed that one of Coca Cola’s torchbearers was a woman “who among other roles sits on a committee of the US’s Food and Drug Administration”.

It was collaborating with contacts in Staffordshire which helped point me to the ‘torch kiss’ image. They in turn followed up the story behind it (a credit for Help Me Investigate was taken out of the piece – it seems old habits die hard), and The Daily Mail followed up on that to get some further reaction and response (and no, they didn’t credit the Stoke Sentinel either). In Bournemouth and Sussex local journalists took up the baton (sorry), and the Times Higher did their angle.

We passed on leads to Ventnor Blog, whose users helped dig into a curious torchbearer running through the area. And we published a list of torchbearers missing stories in The Guardian, where users helped identify them.

Collaborating with an international mailing list for investigative journalists, I generated datasets of local torchbearers in Hungary, Italy, India, the Middle East, Germany, and Romania. German daily newspaper Der Tagesspiegel got in touch and helped trace some of the Germans.

And of course, within the Help Me Investigate network people were identifying mystery torchbearers, getting responses from sponsors, visualising data, and chasing interviews. One contributor in particular – Carol Miers – came on board halfway through and contributed some of the key elements of the final longform report – in particular the interview that opens the book, which I’ll talk about in the final part tomorrow.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl