Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

March 08 2011

15:00

Matt Waite: To build a digital future for news, developers must be able to hack at the core of old systems

Editor’s Note: Matt Waite was until recently news technologist at the St. Petersburg Times, where — among many other projects — he was the primary developer behind Politifact, which won a Pulitzer Prize. He’s also been a leader for the movement to combine news and code in new and interesting ways.

Matt is now teaching journalism at the University of Nebraska and working with news orgs under the shingle Hot Type Consulting. Here, he talks about his disappointment with the pace and breadth of the evolution of coding and news apps in contemporary journalism.

Pay attention to the noise, and you start to hear signal. There’s an awakening going on — quiet and slow, but it’s there. There are voices talking about data and apps and journalism becoming more than just writers writing and editors editing. There are labs starting and partnerships forming. There was a whole conference late last month — NICAR in Raleigh — that more than ever was a creative collision of words and nerds.

It’s tempting to say that a real critical mass is afoot, marrying journalists and technologists and finally getting us to this “Future of Journalism” thing we keep hearing about. I’ve recently had a job change that’s given me some time to reflect on this movement of journalism+programming.

In a word, I’m disappointed.

Not in what’s been done. There’s some amazing work going on inside newsrooms and out, work that every news publisher and manager should be looking at with jealous, thieving eyes. Things like the Los Angeles Times crime app. It’s amazing. The Chicago Tribune elections app. ProPublica’s Docs app. The list goes on and on.

I’m disappointed on what hasn’t been done. Where we, from inside news organizations, haven’t gone. Where we haven’t been allowed to go.

To understand my disappointment, you have to understand, at a very low level, how news gets published and the minds of the people who are actually responsible for the newspaper arriving on your doorstep.

Evolution, but only on the edges

To most journalists, once copy gets through the editors, through the copy desk, and onto a page, there comes a point where magic happens and poof — the paper appears on the doorstep. But if you’ve seen it, you know it’s not magic: It’s a byzantine series of steps, through exceedingly expensive software and equipment, run in a sequence every night in a manner that can be timed with a stopwatch. Any glitch, hiccup, delay, or bump in the process is a four-alarm emergency, because at the other end of this dance is an army of trucks waiting for bundles of paper. In short, it’s got to work exactly the same way every night or piles of cash get burned by people standing around waiting.

Experimentation with the process isn’t just uncomfortable — it’s dangerous and expensive and threatens the very production of the product. In other words, it doesn’t happen unless it’s absolutely necessary and can demonstrably cut costs.

Knowing that, it’s entirely understandable why many of the people who manage newspapers — who have gone their whole professional lives with this rhythmic production model consciously and subconsciously in their minds — would view the world through that prism. Most newspapers rely on gigantic, expensive, monolithic content management systems that function very much like the production systems that print the paper every day. Inputs go in, magic happens, a website comes out. It works the same way every day or there’s hell to pay.

And around that rhythmic mode of operation, we’ve created comfortable workflows that feed it. And because it’s comfortable, there’s an amazing amount of inertia around all of it. Change is scary. The consequences down the line could be bad. We should go slow.

Now, I’m not going to tell you that experimentation is forbidden in the web space, because it’s not. But that experimentation takes place almost entirely outside the main content management system. Story here, news app there. A blog? A separate software stack. Photo galleries? Made elsewhere, embedded into a CMS page (maybe). Graphics? Same. Got something more, like a whole high school sports stats and scores system? Separate site completely, but stories stay in the CMS. You don’t get them.

In short, experiment all you want, so long as you never touch the core product.

And that is the source of my disappointment. All this talk about a digital future, about moving journalism onto the web, about innovation and saving journalism is just talk until developers are allowed to hack at the very core of the whole product. To argue otherwise is to argue that the story form, largely unchanged from print, is perfect and to change it is unnecessary. Hogwash.

The evolution of the story form

Now, I’m not saying “Trash the story form! Down with it all!” The story form has been honed over millennia. We’ve been telling stories since we invented language. A story is a very efficient means to get information from one human to another. But to believe that a story has to be a headline, byline, body copy, a publication date, maybe some tags, and maybe a photo — because that’s what some vendor’s one-size-fits-all content management system tells us is all we get — is ludicrous. It’s a dangerous blind spot just waiting to be exploited by competitors.

I believe that all stories are not the same, and that each type of story we do as journalists has opportunities to augment the work with data, structure, and context. There’s opportunities to alter how a story fits into place, and time. To change the atomic structure of what we do as journalists.

Imagine a crime story that had each location in the crime story stored, providing readers with maps that show not just where the crime happened, but crime rates in those areas over time and recent similar crimes, automatically generated for every crime story that gets written. A crime story that automatically grabs the arrest report or jail record for the accused and pulls it up, automatically following that arrestee and updating the mugshot with their jail status, court status, or adjudication without the reporter having to do anything. Then step back to a page that shows all crime stories and all crime data in your neighborhood or your city. The complete integration of oceans of crime data to the work of journalists, both going on every day without any real connection to each other. Rely on the journalists to tell the story, rely on the data to connect it all together in ways that users will find compelling, interesting, and educational.

Now take that same concept and apply it to politics. Or sports. Or restaurant reviews. Any section of the paper. Obits, wedding announcements, you name it.

Can your CMS do that? Of course it can’t. The amount of customization, the amount of experimentation, the amount of journalism that would have to go on to make that work is impossible for a vendor selling a product to do. But it’s precisely the kind of experimentation we need to be doing.

Building from the ground up

The prevailing notions in newsrooms, whether stated explicitly or just subconsciously believed, is this print-production mindset. Stories, for the most part, function as they do in print — a snapshot in time, alone by itself, unalterable after it’s stamped onto a medium and pushed into the world.

What I’ve never seen is the complete counter-argument to that mindset. The alpha to its omega. Here’s what I think that looks like:

Instead of a single monolithic system, where a baseball game story is the same as a triple murder story, general interest news websites should be a confederation of custom content management systems that handle stories of a specific type. Each system has its own features, pulling data, links, tweets and anything else that can shed light on the topic. Humans + computers. Automated aggregates where they make sense, human judgment where it’s needed. The home page is merely a master aggregation of this confederation.

Each area of the site can evolve on its own, given changes in available data, technology, or staff. It’s the complete destruction and rebuilding of every piece of the workflow. Everyone’s job would change when it came to producing the news.

Crazy, you say? Probably. My developer friends and readers with IT backgrounds are spitting their coffee out right now. But is it any more crazy than continuing to use a print-production approach on the web? I don’t think it is. It is the equal and opposite reaction: little innovation at the core vs. a complete custom rebuilding of it. Frankly, I believe neither is sustainable, but only one continues at mass scale. And I believe it’s the wrong one.

While I was at the St. Petersburg Times, we took this approach of rebuilding the core from scratch with PolitiFact. We built it from the ground up, augmenting the story form with database relationships to people, topics, and rulings (among others). We added transparency by making the listing of sources a required part of an item. We took the atomic parts of a fact-check story and we built a new molecule with them. And with that molecule, we built a national audience for a regional newspaper and won a Pulitzer Prize.

Not bad for a bunch of print journalists experimenting with the story form on the web.

I would be lying if I said that I wasn’t disappointed that PolitiFact’s success didn’t unleash a torrent of programmers and journalists and journalist/programmers hacking away on new story forms. It hasn’t and I am.

But I’m not about to blame programmers in the newsroom. Many that I talk to are excited to experiment in any way they can with journalism and the web. The enemy is what we cling to. And it’s time to let go.

February 17 2011

18:30

How public is public data? With Public Engines v. ReportSee, new access standards could emerge

A recently settled federal court case out in Utah may affect the way news organizations and citizens get access to crime data.

Public Engines, a company that publishes crime statistics for law enforcement agencies, sued ReportSee, which provides similar services, for misappropriating crime data ReportSee makes available on CrimeReports.com. In the settlement, ReportSee is barred from using data from Public Engines, as well as from asking for data from agencies that work with Public Engines.

At first glance, the companies seem virtually identical, right down to their similar mapping sites CrimeReport.com (Public Engines) and SpotCrime.com (ReportSee). The notable exception is that Public Engines contracts with police and sheriff departments for its data and provides tools to manage information. ReportSee, on the other hand, relies on publicly available feeds.

In the settlement between the two websites, a new question arises: Just what constitutes publicly available data? Is it raw statistics or refined numbers presented by a third party? Governments regularly farm out their data to companies that prepare and package records, but what stands out in this case is that Public Engines effectively laid claimed to the information provided to it by law enforcement. This could be problematic to news organizations, developers, and citizens looking to get their hands on data. While still open and available to the public, the information (and the timing of its release) could potentially be dictated by a private company.

“The value in this kind of crime data is distributing it as quickly as possible so the public can interact with it,” Colin Drane, the founder of SpotCrime, told me.

In its news release on the settlement, Public Engine notes that it works with more than 1,600 law enforcement agencies in the US. Greg Whisenant, CEO of Public Engines, said in the statement that the company is pleased with the outcome of the case, concluding, “The settlement ushers in a new era of transparency and accessibility for the general public. It clearly validates our perspective that law enforcement agencies should retain the right to manage and control the data they decide to share.”

Naturally, Drane sees things differently. “I just don’t think people recognize that the data is being, essentially, privatized,” he said.

That may be a slight exaggeration, evidenced by the fact that SpotCrime is still operating. Instead of signing contracts with law enforcement agencies, SpotCrime requests data that is available for free and runs ads on its map pages. The company also partners with local media to run crime maps on news sites.

Through Drane sought to create a business through data mapping, his methods are largely similar to those of news organizations, relying on open data and free mapping tools. And just like news organizations, Drane finds that the hardest part of the job can be negotiating to get records.

“The technology has been here for years, but the willingness to use it is just starting for many cities,” Drane said.

The open data movement has certainly exploded in recent years, from property and tax records at the municipal level all the way up to Data.gov. As a result, news organizations are not only doing data-backed reporting, but also building online features and news apps. And news organizations are not alone, as developers and entrepreneurs like Drane are mining open datasets to try to create tools and fill information needs within communities.

I asked David Ardia of the Citizen Media Law Project whether this case could hinder development of more data products or have broader ramifications for journalists and citizens. The short answer is no, he said, since no ruling was issued. But Public Engines could be emboldened to take action against competitors, Ardia noted — and, as a result, developers looking to do something similar to what Drane has done may think twice about using public data.

“This is just the tip of the iceberg,” Ardia said. “There are tremendous amounts of money to be made in government information and data.”

In this case, Public Engines saw crime data as a proprietary product — and Dane’s company as infringing on their contract. It also claimed misappropriation of the hot news doctrine, arguing that it gathers and publishes information in a timely manner as part of its business. (An interesting link Ardia points out: On its FAQ page, CrimeReports.com says it does not make crime data downloadable “to the general public for financial and legal reasons.”)

Ardia said the larger question is twofold: first, whether government agencies will let third parties exert control over public data, and, second, who can access that data. As more local and state departments use outside companies to process records, tax dollars that go towards managing data are essentially paid to limit access to the public. Drane and his company were barred from using or asking to use public crime data in certain cities: If crime data is the property of a third party, the police department could either direct people to CrimeReports.com or, Ardia worries, say that it’s not free to make the information available to others.

“This is a problematic trend as governments adapt to and adopt these technologies that improve their use and analysis of information,” Ardia said.

Obviously all of this runs counter to established practice for public records and data in journalism, and Ardia said that it’s likely the issue won’t be settled until a case similar to Public Engines v. ReportSee makes its way to the courts. (We should have a better view of how the hot news doctrine holds up overall, though, after an appeals court rules on the FlyOnTheWall case.) But a better option could be to adapt current open records laws to reflect changes in how data is stored, processed, and accessed, Ardia said. Businesses and developers should be able to build products on a layer of public data, he said, but not exclusively — or at the expense of greater access for the broader public.

“We don’t have to wait for the courts to resolve this. Part of this can be addressed through changes in open records laws,” Ardia said. “Put the onus on agencies to make this data available when they sign agreements with third parties.”

February 01 2011

10:55

Why journalists should be lobbying over police.uk’s crime data

UK police crime maps

Conrad Quilty-Harper writes about the new crime data from the UK police force – and in the process adds another straw to the groaning camel’s back of the government’s so-called transparency agenda:

“It’s useless to residents wanting to find out what was going on at the house around the corner at 3am last night, and it’s useless to individuals who want to build mobile phone applications on top of the data (perhaps to get a chunk of that £6 billion industry open data is supposed to create).

“The site’s limitations are as follows:

  • No IDs for crimes: what if I want to check whether real life crimes have made it onto the map? Sorry.
  • Six crime categories: including “other crimes”, everything from drug dealing to bank robberies in one handy, impossible to understand category.
  • No live data: you mean I have to wait until the end of the next month to see this month’s criminality?!
  • No dates or times: funny how without dates and times I can’t tell which police manager was in charge.
  • Case status: the police know how many crimes go solved or unsolved, why not tell us this?”

This is why people are so concerned about the Public Data Corporation. This is why we need to be monitoring exactly what spending data councils release, and in what format. And this is why we need to continue to press for the expansion of FOI laws. This is what we should be doing. Are we?

October 07 2010

14:00

Los Angeles Times collaborates across the newsroom and with readers to map neighborhood crime

There’s something about the immediacy of the web that makes interactive features seem effortless: One click and the information is there. But of course the feel of the end product is not the same as the process required to get it there. Just ask the Los Angeles Times.

Last week the Times unveiled a new stage in its ongoing mapping project, Mapping L.A. The latest piece lets users check out crime data by neighborhood, including individual crimes and crime trends. Ultimately, the goal is to give locals access to encyclopedia-style information about their neighborhoods, including demographic, crime, and school information. And for reporters, it’s a helpful tool to add context to a story or spot trends. Getting the project where it is now has been a two-year process, drawing on talent across the newsroom and tapping the expertise of the crowd. I spoke with Ben Welsh, the LAT developer working on the project, about what it’s taken to piece it together. Hint: collaboration.

“I was lucky to find some natural allies who had a vision for what we could find out,” Welsh told me. “In some sense it’s the older generation of geek reporters. There’s this whole kind of tradition of that. We talk the same language. They collect all this data — and I want data so we can do stuff online. Even though we don’t have the same bosses, we have this kind of ad hoc alliance.”

Before Welsh could start plotting information, like crime or demographics data, the Times had to back up to a much simpler question: What are the neighborhood boundaries in Los Angeles city and county?

“Because there are no official answers and there are just sort of consensus and history and these things together, we knew from the get-go it was going to be controversial,” Welsh said. “We designed it from the get-go to let people to tell us we suck.”

And people did. About 1,500 people weighed in on the first round of the Times’ mapping project. A tool allowed users to create their own boundary maps for neighborhoods. Between the first round and second round, the Times made 100 boundary changes. (Compare the original map to the current one.) “I continue to receive emails that we’re wrong,” more than a year later, Welsh said.

An offshoot project of the neighborhood project was a more targeted question that every Angeleno can answer: “What is the ‘West Side’?” Welsh said the hundreds of responses were impassioned and creative. The West Side project was recently named a finalist for the Online News Association’s annual awards in the community collaboration category.

Welsh has now layered census, school, and crime data into the project. Working with those varied government data set brings unique problems. “We put all kinds of hours in to clean the data,” Welsh said. “I think a lot of times journalists don’t talk about that part.” At one point, the Times discovered widespread errors in the Los Angeles Police Department data, for example. The department got an early look at the project and supports the Times’ efforts, and has actually abandoned its own mapping efforts, deciding to use the Times’ instead.

Welsh doesn’t talk about the project in terms of it ever being “finished.” “With everything you add, you hope to make it this living, breathing thing,” he said. In the long-run, he hopes the Times will figure out a way to offer a more sophisticated analysis of the data. “That’s a challenging thing,” he said. In the more immediate future, he hopes to expand the geographic footprint of the project.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl