Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

October 07 2010

14:00

Los Angeles Times collaborates across the newsroom and with readers to map neighborhood crime

There’s something about the immediacy of the web that makes interactive features seem effortless: One click and the information is there. But of course the feel of the end product is not the same as the process required to get it there. Just ask the Los Angeles Times.

Last week the Times unveiled a new stage in its ongoing mapping project, Mapping L.A. The latest piece lets users check out crime data by neighborhood, including individual crimes and crime trends. Ultimately, the goal is to give locals access to encyclopedia-style information about their neighborhoods, including demographic, crime, and school information. And for reporters, it’s a helpful tool to add context to a story or spot trends. Getting the project where it is now has been a two-year process, drawing on talent across the newsroom and tapping the expertise of the crowd. I spoke with Ben Welsh, the LAT developer working on the project, about what it’s taken to piece it together. Hint: collaboration.

“I was lucky to find some natural allies who had a vision for what we could find out,” Welsh told me. “In some sense it’s the older generation of geek reporters. There’s this whole kind of tradition of that. We talk the same language. They collect all this data — and I want data so we can do stuff online. Even though we don’t have the same bosses, we have this kind of ad hoc alliance.”

Before Welsh could start plotting information, like crime or demographics data, the Times had to back up to a much simpler question: What are the neighborhood boundaries in Los Angeles city and county?

“Because there are no official answers and there are just sort of consensus and history and these things together, we knew from the get-go it was going to be controversial,” Welsh said. “We designed it from the get-go to let people to tell us we suck.”

And people did. About 1,500 people weighed in on the first round of the Times’ mapping project. A tool allowed users to create their own boundary maps for neighborhoods. Between the first round and second round, the Times made 100 boundary changes. (Compare the original map to the current one.) “I continue to receive emails that we’re wrong,” more than a year later, Welsh said.

An offshoot project of the neighborhood project was a more targeted question that every Angeleno can answer: “What is the ‘West Side’?” Welsh said the hundreds of responses were impassioned and creative. The West Side project was recently named a finalist for the Online News Association’s annual awards in the community collaboration category.

Welsh has now layered census, school, and crime data into the project. Working with those varied government data set brings unique problems. “We put all kinds of hours in to clean the data,” Welsh said. “I think a lot of times journalists don’t talk about that part.” At one point, the Times discovered widespread errors in the Los Angeles Police Department data, for example. The department got an early look at the project and supports the Times’ efforts, and has actually abandoned its own mapping efforts, deciding to use the Times’ instead.

Welsh doesn’t talk about the project in terms of it ever being “finished.” “With everything you add, you hope to make it this living, breathing thing,” he said. In the long-run, he hopes the Times will figure out a way to offer a more sophisticated analysis of the data. “That’s a challenging thing,” he said. In the more immediate future, he hopes to expand the geographic footprint of the project.

October 04 2010

07:41

Where should an aspiring data journalist start?

In writing last week’s Guardian Data Blog piece on How to be a data journalist I asked various people involved in data journalism where they would recommend starting. The answers are so useful that I thought I’d publish them in full here.

The Telegraph’s Conrad Quilty-Harper:

Start reading:

http://www.google.com/reader/bundle/user%2F06076274130681848419%2Fbundle%2Fdatavizfeeds

Keep adding to your knowledge and follow other data journalists/people who work with data on Twitter.

Look for sources of data:

ONS stats release calendar is a good start http://www.statistics.gov.uk/hub/release-calendar/index.html Look at the Government data stores (Data.gov, Data.gov.uk, Data.london.gov.uk etc).

Check out What do they know, Freebase, Wikileaks, Manyeyes, Google Fusion charts.

Find out where hidden data is and try and get hold of it: private companies looking for publicity, under appreciated research departments, public bodies that release data but not in a granular form (e.g. Met Office).

Test out cleaning/visualisation tools:

You want to be able to collect data, clean it, visualise it and map it.

Obviously you need to know basic Excel skills (pivot tables are how journalists efficiently get headline numbers from big spreadsheets).

For publishing just use Google Spreadsheets graphs, or ManyEyes or Timetric. Google MyMaps coupled with http://batchgeo.com is a great beginner mapping combo.

Further on from that you want to try out Google Spreadsheets importURL service, Yahoo Pipes for cleaning data, Freebase Gridworks and Dabble DB.

More advanced stuff you want to figure out query language and be able to work with relational databases, Google BigQuery, Google Visualisation API (http://code.google.com/apis/charttools/), Google code playgrounds (http://code.google.com/apis/ajax/playground/?type=visualization#org_chart) and other Javascript tools. The advanced mapping equivalents are ArcGIS or GeoConcept, allowing you to query geographical data and find stories.

You could also learn some Ruby for building your own scrapers, or Python for ScraperWiki.

Get inspired:

Get the data behind some big data stories you admire, try and find a story, visualise it and blog about it. You’ll find that the whole process starts with the data, and your interpretation of it. That needs to be newsworthy/valuable.

Look to the past!

Edward Tufte’s work is very inspiring: http://www.edwardtufte.com/tufte/ His favourite data visualisation is from 1869! Or what about John Snow’s Cholera map? http://www.york.ac.uk/depts/maths/histstat/snow_map.htm

And for good luck here’s an assorted list of visualisation tutorials.

The Times’ Jonathan Richards

I’d say a couple of blogs.

Others that spring to mind are:

If people want more specific advice, tell them to come to the next London Hack/Hackers and track me down!

The Guardian’s Charles Arthur:

Obvious thing: find a story that will be best told through numbers. (I’m thinking about quizzing my local council about the effects of stopping free swimming for children. Obvious way forward: get numbers for number of children swimming before, during and after free swimming offer.)

If someone already has the skills for data journalism (which I’d put at (1) understanding statistics and relevance (2) understanding how to manipulate data (3) understanding how to make the data visual) the key, I’d say, is always being able to spot a story that can be told through data – and only makes sense that way, and where being able to manipulate the data is key to extracting the story. It’s like interviewing the data. Good interviewers know how to get what they want out from the conversation. Ditto good data journalists and their data.

The New York Times’ Aron Pilhofer:

I would start small, and start with something you already know and already do. And always, always, always remember that the goal here is journalism. There is a tendency to focus too much on the skills for the sake of skills, and not enough on how those skills help enable you to do better journalism. Be pragmatic about it, and resist the tendency to think you need to know everything about the techy stuff before you do anything — nothing could be further from the truth.

Less abstractly, I would start out learning some basic computer-assisted reporting skills and then moving from there as your interests/needs dictate. A lot of people see the programmer/journalism thing as distinct from computer-assisted reporting, but I don’t. I see it as a continuum. I see CAR as a “gateway drug” of sorts: Once you start working with small data sets using tools like Excel, Access, MySQL, etc., you’ll eventually hit limits of what you can do with macros and SQL.

Soon enough, you’ll want to be able to script certain things. You’ll want to get data from the web. You’ll want to do things you can only do using some kind of scripting language, and so it begins.

But again, the place to start isn’t thinking about all these technologies. The place to start is thinking about how these technologies can enable you to tell stories you otherwise would never be able to tell otherwise. And you should start small. Look for little things to start, and go from there.

February 02 2010

14:49

Hacks and Hackers play with data-driven news

Last Friday’s London-based Hacks and Hacker’s Day, run by ScraperWiki (a new data tool set to launch in beta soon), provided some excellent inspiration for journalists and developers alike.

In groups, the programmers and journalists paired up to combine journalistic and data knowledge, resulting in some innovative projects: a visualisation showing the average profile of Conservative candidates standing in safe seats for the General Election (the winning project); graphics showing the most common words used for each horoscope sign; and an attempt to tackle the various formats used by data.gov.uk.

One of the projects, ‘They Write For You’ was an attempt to illustrate the political mix of articles by MPs for British newspapers and broadcasters. Using byline data combined with MP name data, the journalists and developers created this pretty mashup, which can be viewed at this link.

The team took the 2008-2010 data from Journalisted and used ScraperWiki, Python, Ruby and JavaScript to create the visualisation: each newspaper shows a byline breakdown by party. By hovering over a coloured box, users can see which MPs wrote for which newspaper over the same two year period.

The exact statistics, however, should be treated with some caution, as the information has not yet been cross-checked with other data sets.  It would appear, for example, that the Guardian newspaper published more stories by MPs than any other title, but this could be that Journalisted holds more information about the Guardian than its counterparts.

While this analysis is not yet ready to be transformed into a news story, it shows the potential for employing data skills to identify media and political trends.

Similar Posts:



Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl