Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 28 2012

16:12

How a conference taught me I know nothing

The 2012 Computer-Assisted Reporting conference in St. Louis provided journalists with plenty of new reporting tools. Here's our top-15 list of applications and websites from the weekend. Read More »

February 10 2012

18:29

Meet us at the Computer-Assisted Reporting Conference

Our team is heading to St. Louis on Feb. 23 for the annual computer-assisted reporting conference, and we'd love to meet you. Come to our sessions or stop us in the hall to learn how we can help your newsroom and how to get involved. Read More »
18:00

Still shaping the way people think about news innovation? A few reflections on the new KNC 2.0

As someone who probably has spent more time thinking about the Knight News Challenge than anyone outside of Knight Foundation headquarters — doing a dissertation on the subject will do that to you! — I can’t help but follow its evolution, even after my major research ended in 2010. And evolve it has: from an initial focus on citizen journalism and bloggy kinds of initiatives (all the rage circa 2007, right?) to a later emphasis on business models, visualizations, and data-focused projects (like this one) — among a whole host of other projects including news games, SMS tools for the developing world, crowdsourcing applications, and more.

Now, after five years and $27 million in its first incarnation, Knight News Challenge 2.0 has been announced for 2012, emphasizing speed and agility (three contests a year, eight-week turnarounds on entries) and a new topical focus (the first round is focused on leveraging existing networks). While more information will be coming ahead of the February 27 launch, here are three questions to chew on now.

Does the Knight News Challenge still dominate this space?

The short answer is yes (and I’m not just saying that because, full disclosure, the Knight Foundation is a financial supporter of the Lab). As I’ve argued before, in the news innovation scene, at this crossroads of journalism and technology communities, the KNC has served an agenda-setting kind of function — perhaps not telling news hipsters what to think regarding the future of journalism, but rather telling them what to think about. So while folks might disagree on the Next Big Thing for News, there’s little question that the KNC has helped to shape the substance and culture of the debate and the parameters in which it occurs.

Some evidence for this comes from the contest itself: Whatever theme/trend got funded one year would trigger a wave of repetitive proposals the next. (As Knight said yesterday: “Our concern is that once we describe what we think we might see, we receive proposals crafted to meet our preconception.”)

And yet the longer answer to this question is slightly more nuanced. When the KNC began in 2006, with the first winners named in 2007, it truly was the only game in town — a forum for showing “what news innovation looks like” unlike any other. Nowadays, a flourishing ecosystem of websites (ahem, like this one), aggregators (like MediaGazer), and social media platforms is making the storyline of journalism’s reboot all the more apparent. It’s easier than ever to track who’s trying what, which experiments are working, and so on — and seemingly in real time, as opposed to a once-a-year unveiling. Hence the Knight Foundation’s move to three quick-fire contests a year, “as we try to bring our work closer to Internet speed.”

How should we define the “news” in News Challenge?

One of the striking things I found in my research (discussed in a previous Lab post) was that Knight, in its overall emphasis, has pivoted away from focusing mostly on journalism professionalism (questions like “how do we train/educate better journalists?”) and moved toward a broader concern for “information.” This entails far less regard for who’s doing the creating, filtering, or distributing — rather, it’s more about ensuring that people are informed at the local community level. This shift from journalism to information, reflected in the Knight Foundation’s own transformation and its efforts to shape the field, can be seen, perhaps, like worrying less about doctors (the means) and more about public health (the ends) — even if this pursuit of health outcomes sometimes sidesteps doctors and traditional medicine along the way.

This is not to say that Knight doesn’t care about journalism. Not at all. It still pours millions upon millions of dollars into clearly “newsy” projects — including investigative reporting, the grist of shoe-leather journalism. Rather, this is about Knight trying to rejigger the boundaries of journalism: opening them up to let other fields, actors, and ideas inside.

So, how should you define “news” in your application? My suggestion: broadly.

What will be the defining ethos of KNC 2.0?

This is the big, open, and most interesting question to me. My research on the first two years of KNC 1.0, using a regression analysis, found that contest submissions emphasizing participation and distributed knowledge (like crowdsourcing) were more likely to advance, all things being equal. My followup interviews with KNC winners confirmed this widely shared desire for participation — a feeling that the news process not only could be shared with users, but in fact should be.

I called this an “ethic of participation,” a founding doctrine of news innovation that challenges journalism’s traditional norm of professional control. But perhaps, to some extent, that was a function of the times, during the roughly 2007-2010 heyday of citizen media, with the attendant buzz around user-generated content as the hot early-adopter thing in news — even if news organizations then, as now, struggled to reconcile and incorporate a participatory audience. Even while participation has become more mainstream in journalism, there are still frequent flare-ups, like this week’s flap over breaking news on Twitter, revealing enduring tensions at the “collision of two worlds — when a hierarchical media system in the hands of the few collides with a networked media system open to all,” as Alfred Hermida wrote.

So what about this time around? Perhaps KNC 2.0 will have an underlying emphasis on Big Data, algorithms, news apps, and other things bubbling up at the growing intersection of computer science and journalism. It’s true that Knight is already underwriting a significant push in this area through the (also just-revised) Knight-Mozilla OpenNews project (formerly called the Knight-Mozilla News Technology Partnership — which Nikki Usher and I have written about for the Lab). To what extent is there overlap or synergy here? OpenNews, for 2012, is trying to build on the burgeoning “community around code” in journalism — leveraging the momentum of Hacks/Hackers, NICAR, and ONA with hackfests, code-swapping, and online learning. KNC 2.0, meanwhile, talks about embracing The Hacker Way described by Mark Zuckerberg — but at the same time backs away a bit from its previous emphasis on open source as a prerequisite. It’ll be interesting to see how computational journalism — explained well in this forthcoming paper (PDF here) by Terry Flew et al. in Journalism Practice — figures into KNC 2.0.

Regardless, the Knight News Challenge is worth watching for what it reveals about the way people — journalists and technologists, organizations and individuals, everybody working in this space — talk about and make sense of “news innovation”: what it means, where it’s taking us, and why that matters for the future of journalism.

January 10 2012

15:20

The Top 10 Data-Mining Links of 2011

Overview is a project to create an open-source document-mining system for investigative journalists and other curious people. We've written before about the goals of the project, and we're developing some new technology, but mostly we're stealing it from other fields.

overview.png

The following are some of the best ideas we saw in 2011, the data-mining work that we found most inspirational. Many of these links are educational resources for learning about specific technology. Some of this work illuminates how algorithms and humans treat information differently. Other are just amazing, mind-bending work.

1. What do your connections say about you? A lot. It is possible to accurately predict your political orientation solely on the basis of your network on Twitter. You can also work out gender and other things from public information.

2. Free textbooks from Stanford University. "Introduction to Information Retrieval" teaches you how a search engine works, in great detail. "Mining Massive Data Sets" covers a variety of big-data principles that apply to different types of information.

3. We're not above having a list of lists. Here's the Data Mining Blog's top 5 articles. Most of these are foundational, covering basic philosophy and technique such as choosing variables, finding clusters, and deciding what you're looking for.

4. The MINE technique looks for patterns between hundreds or thousands of variables -- say, patterns of gene expression inside a single cell. It's very general, and finds not only individual relationships but networks of cause and effect. Here's a nifty video, here's the original paper, and here's one statistician's review.

5. This is one of those papers that really changed the way I look at things. How do we know when a data visualization shows us something that is "actually there," as opposed to an artifact of the numbers? "Graphical Inference for Infovis" provides one excellent answer, based on a clever analogy with numerical statistics.

6. Lots of text-mining work uses "clustering" or "classification" techniques to sort documents into topics. But doesn't a categorization algorithm impose its own preconceptions? This is a deep issue, which you might think of as "framing" in code. To explore this question Justin Grimmer and Gary King went meta with a system that visualizes all possible categorizations of a document set, and how they relate.

7. A few years ago Google showed that the number of searches for "flu" was a great predictor of the actual number of outbreaks in a given location -- faster and more specific than the Center for Disease Control's own surveillance data. The team has now expanded the technique into Google Correlate, which instantly scans through petabytes of data to find search terms which follow any user-supplied time series. Here's New Scientist taking it for a test drive.

stanford.png

8. Not content with free professional textbooks, Stanford has created two free online courses for machine learning and natural language processing. Both are live-streamed lecture series taught by experts, with homework. Learning these intricate technologies has never been easier.

9. Lots of people have speculated about the role of social media in protest movements. A team of researchers looked at the data, analyzing a huge set of tweets from the "May 20" protests in Spain last year. How do protests spread from social media? Now we have at least one solid answer.

10. And the craziest data-mining link we ran across in 2011: IBM's DeepQA project, which beat human Jeopardy champions. This project looks into an unstructured database to correctly answer about 80% of all general questions posed to it, in just a few seconds. Here's a TED talk, and here's the technical paper that explains how it works. I can't tell you how badly I want one of these in the newsroom. If enough journalist hackers build on each other's work, maybe one day ...

Happy data mining! We'll be releasing our own prototype document-mining system, and the source, at the NICAR conference next month. If these are the sorts of algorithms you like to play with, we're also hiring programmers who want to bring these sorts of advanced techniques within everyone's reach.

March 08 2011

15:00

Matt Waite: To build a digital future for news, developers must be able to hack at the core of old systems

Editor’s Note: Matt Waite was until recently news technologist at the St. Petersburg Times, where — among many other projects — he was the primary developer behind Politifact, which won a Pulitzer Prize. He’s also been a leader for the movement to combine news and code in new and interesting ways.

Matt is now teaching journalism at the University of Nebraska and working with news orgs under the shingle Hot Type Consulting. Here, he talks about his disappointment with the pace and breadth of the evolution of coding and news apps in contemporary journalism.

Pay attention to the noise, and you start to hear signal. There’s an awakening going on — quiet and slow, but it’s there. There are voices talking about data and apps and journalism becoming more than just writers writing and editors editing. There are labs starting and partnerships forming. There was a whole conference late last month — NICAR in Raleigh — that more than ever was a creative collision of words and nerds.

It’s tempting to say that a real critical mass is afoot, marrying journalists and technologists and finally getting us to this “Future of Journalism” thing we keep hearing about. I’ve recently had a job change that’s given me some time to reflect on this movement of journalism+programming.

In a word, I’m disappointed.

Not in what’s been done. There’s some amazing work going on inside newsrooms and out, work that every news publisher and manager should be looking at with jealous, thieving eyes. Things like the Los Angeles Times crime app. It’s amazing. The Chicago Tribune elections app. ProPublica’s Docs app. The list goes on and on.

I’m disappointed on what hasn’t been done. Where we, from inside news organizations, haven’t gone. Where we haven’t been allowed to go.

To understand my disappointment, you have to understand, at a very low level, how news gets published and the minds of the people who are actually responsible for the newspaper arriving on your doorstep.

Evolution, but only on the edges

To most journalists, once copy gets through the editors, through the copy desk, and onto a page, there comes a point where magic happens and poof — the paper appears on the doorstep. But if you’ve seen it, you know it’s not magic: It’s a byzantine series of steps, through exceedingly expensive software and equipment, run in a sequence every night in a manner that can be timed with a stopwatch. Any glitch, hiccup, delay, or bump in the process is a four-alarm emergency, because at the other end of this dance is an army of trucks waiting for bundles of paper. In short, it’s got to work exactly the same way every night or piles of cash get burned by people standing around waiting.

Experimentation with the process isn’t just uncomfortable — it’s dangerous and expensive and threatens the very production of the product. In other words, it doesn’t happen unless it’s absolutely necessary and can demonstrably cut costs.

Knowing that, it’s entirely understandable why many of the people who manage newspapers — who have gone their whole professional lives with this rhythmic production model consciously and subconsciously in their minds — would view the world through that prism. Most newspapers rely on gigantic, expensive, monolithic content management systems that function very much like the production systems that print the paper every day. Inputs go in, magic happens, a website comes out. It works the same way every day or there’s hell to pay.

And around that rhythmic mode of operation, we’ve created comfortable workflows that feed it. And because it’s comfortable, there’s an amazing amount of inertia around all of it. Change is scary. The consequences down the line could be bad. We should go slow.

Now, I’m not going to tell you that experimentation is forbidden in the web space, because it’s not. But that experimentation takes place almost entirely outside the main content management system. Story here, news app there. A blog? A separate software stack. Photo galleries? Made elsewhere, embedded into a CMS page (maybe). Graphics? Same. Got something more, like a whole high school sports stats and scores system? Separate site completely, but stories stay in the CMS. You don’t get them.

In short, experiment all you want, so long as you never touch the core product.

And that is the source of my disappointment. All this talk about a digital future, about moving journalism onto the web, about innovation and saving journalism is just talk until developers are allowed to hack at the very core of the whole product. To argue otherwise is to argue that the story form, largely unchanged from print, is perfect and to change it is unnecessary. Hogwash.

The evolution of the story form

Now, I’m not saying “Trash the story form! Down with it all!” The story form has been honed over millennia. We’ve been telling stories since we invented language. A story is a very efficient means to get information from one human to another. But to believe that a story has to be a headline, byline, body copy, a publication date, maybe some tags, and maybe a photo — because that’s what some vendor’s one-size-fits-all content management system tells us is all we get — is ludicrous. It’s a dangerous blind spot just waiting to be exploited by competitors.

I believe that all stories are not the same, and that each type of story we do as journalists has opportunities to augment the work with data, structure, and context. There’s opportunities to alter how a story fits into place, and time. To change the atomic structure of what we do as journalists.

Imagine a crime story that had each location in the crime story stored, providing readers with maps that show not just where the crime happened, but crime rates in those areas over time and recent similar crimes, automatically generated for every crime story that gets written. A crime story that automatically grabs the arrest report or jail record for the accused and pulls it up, automatically following that arrestee and updating the mugshot with their jail status, court status, or adjudication without the reporter having to do anything. Then step back to a page that shows all crime stories and all crime data in your neighborhood or your city. The complete integration of oceans of crime data to the work of journalists, both going on every day without any real connection to each other. Rely on the journalists to tell the story, rely on the data to connect it all together in ways that users will find compelling, interesting, and educational.

Now take that same concept and apply it to politics. Or sports. Or restaurant reviews. Any section of the paper. Obits, wedding announcements, you name it.

Can your CMS do that? Of course it can’t. The amount of customization, the amount of experimentation, the amount of journalism that would have to go on to make that work is impossible for a vendor selling a product to do. But it’s precisely the kind of experimentation we need to be doing.

Building from the ground up

The prevailing notions in newsrooms, whether stated explicitly or just subconsciously believed, is this print-production mindset. Stories, for the most part, function as they do in print — a snapshot in time, alone by itself, unalterable after it’s stamped onto a medium and pushed into the world.

What I’ve never seen is the complete counter-argument to that mindset. The alpha to its omega. Here’s what I think that looks like:

Instead of a single monolithic system, where a baseball game story is the same as a triple murder story, general interest news websites should be a confederation of custom content management systems that handle stories of a specific type. Each system has its own features, pulling data, links, tweets and anything else that can shed light on the topic. Humans + computers. Automated aggregates where they make sense, human judgment where it’s needed. The home page is merely a master aggregation of this confederation.

Each area of the site can evolve on its own, given changes in available data, technology, or staff. It’s the complete destruction and rebuilding of every piece of the workflow. Everyone’s job would change when it came to producing the news.

Crazy, you say? Probably. My developer friends and readers with IT backgrounds are spitting their coffee out right now. But is it any more crazy than continuing to use a print-production approach on the web? I don’t think it is. It is the equal and opposite reaction: little innovation at the core vs. a complete custom rebuilding of it. Frankly, I believe neither is sustainable, but only one continues at mass scale. And I believe it’s the wrong one.

While I was at the St. Petersburg Times, we took this approach of rebuilding the core from scratch with PolitiFact. We built it from the ground up, augmenting the story form with database relationships to people, topics, and rulings (among others). We added transparency by making the listing of sources a required part of an item. We took the atomic parts of a fact-check story and we built a new molecule with them. And with that molecule, we built a national audience for a regional newspaper and won a Pulitzer Prize.

Not bad for a bunch of print journalists experimenting with the story form on the web.

I would be lying if I said that I wasn’t disappointed that PolitiFact’s success didn’t unleash a torrent of programmers and journalists and journalist/programmers hacking away on new story forms. It hasn’t and I am.

But I’m not about to blame programmers in the newsroom. Many that I talk to are excited to experiment in any way they can with journalism and the web. The enemy is what we cling to. And it’s time to let go.

February 25 2011

04:48

Data Visualization Tools, Slides and Links from NICAR11

The first day of CAR2011 was stuffed full of information, so much so that the only way to keep up with everything is to keep a log of what people have been sharing.

I’ll update this post throughout the conference and organize it better over the weekend. In the meantime, prepare to have your mind blown.

Got links from sessions you attended? Post them in comments and I’ll add them to this list.

References

Analysis-ready census data (from USA Today, available to NICAR members only)
A directory of statistics bureaus by country (from Statistics Sweden)
Numberway.com – lookup phone numbers around the world
Little Sis – visualizing the networks of social, financial and political power
Data Visualization for Beginners (from the CAR2011 conference blog)
Tracking the Economy and Business (from the CAR2011 conference blog)
Getting into a data-oriented mindset (from Mary Jo Webster and Wendell Cochran)

Presentations

Almost Scraping: Web Scraping without Programming (from Michelle Minkoff and Matt Waite)

Data Visualization with JavaScript and HTML5 (from Jeff Larson)

PostGIS is Your New Bicycle – be wowed by a free alternative to costly desktop GIS (from Mike Corey and Ben Welsh)

Software & Tools

API Playground – try APIs, no coding skills necessary
ChangeTracker from ProPublica – track changes to any website
Google Fusion Tables
Needlebase
Protovis
R statistical analysis software
Simile Timeline
TimeFlow

Work Samples

The Killing Roads – interactive map of highway accidents in Norway

Related Posts:

Share: Print Digg del.icio.us Facebook Google Bookmarks StumbleUpon Tumblr Twitter

March 31 2010

14:00

Collaboration’s power: ProPublica’s healthcare bill viewer

That very cool, side-by-side comparison of the Senate and House health care bills ProPublica launched before the health care reconciliation vote? It came about over coffee.

Jeff Larson, the outfit’s news applications developer, and Olga Pierce, its health reporter, were taking a break from the proceedings at this year’s NICAR conference earlier this month in Phoenix. They began chatting about what ProPublica might do to help people make sense of the House reconciliation version of the Senate bill passed late last year.

“I had this Platonic idea in my head for diffed versions of the documents,” Larson says; and he and Pierce, over their coffee, realized that the reconciliation — which was, at the time, imminent — created the need for a tool that would both leverage and enable textual comparison. They ran the idea for a side-by-side bill viewer by Scott Klein, ProPublica’s news applications editor, and Klein green-lighted the tool. Building the tool in time for Sunday’s vote would require a less-than-two-day turnaround — a challenge made more acute by the fact that the reconciliation version released by the House wasn’t a new version at all, but rather “a 150-page list of amendments to the Senate bill (’strike paragraph 4,’ ‘insert this new sentence in paragraph B…’).” So “it was one of those moments when it was like, ‘Okay, it’s go time,’” Pierce says.

They returned to ProPublica offices on Thursday and set to work: Larson, coding the infrastructure that would enable side-by-side document viewing; Pierce, entering the changes in the reconciliation markup — one by one, manually, via cut-and-paste. She worked until 6 a.m. on Friday (“it was an exercise in Zen, basically,” she says); later that morning, the team added reinforcements to help her wrap up the job. By Friday afternoon, the comparison tool was live, and being linked, and raking in kudos from around the web:

Which would simply be a nice little vignette — an Engine That Could story, with a startuppy twist — except that it also offers a nice little lesson. Because it wasn’t just ingenuity and industry that led to the quick creation of a very useful tool; it was also, even more importantly, interactivity. Pierce and Larson, who sit just feet away from each other in the ProPublica newsroom, regularly converse about new applications that will make the most of the data Pierce gathers and employs in her work. “I can just stroll over and be like, ‘Okay, I have this idea…’” she says. (“And then Jeff goes like this,” she adds, putting her head in her hands.)

They laugh, but they also see the value in that kind of casual conversation — particularly now, as reporting and coding become increasingly mutualized. “In other places, I would be on an entirely different floor than Olga, or maybe even in a different building,” Larson says. “And we just never would interface and come up with these ideas.” ProPublica’s newsroom, though — a large spread on the 23rd floor of a lower-Manhattan high-rise, with everyone save for the top editors and a few business-side staffers sharing cube space in an open layout — encourages interaction and idea-sharing. Among all staff, but in particular between the tech side and the editorial: two groups that are all too often separated in newsrooms, not just rhetorically, but geographically. Too often, Larson says, “there’s kind of a Chinese wall.” But a good layout can make all the difference; and that’s just as true for newsrooms as it is for the news itself.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl