Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 17 2013

18:28

How is algorithmic objectivity related to journalistic objectivity?

Today at New York University, a bunch of smart people are gathered at the Governing Algorithms conference.

Algorithms are increasingly invoked as powerful entities that control, govern, sort, regulate, and shape everything from financial trades to news media. Nevertheless, the nature and implications of such orderings are far from clear. What exactly is it that algorithms “do”? What is the role attributed to “algorithms” in these arguments? How can we turn the “problem of algorithms” into an object of productive inquiry? This conference sets out to explore the recent rise of algorithms as an object of interest in scholarship, policy, and practice.

If this interests you, I’d suggest following #govalgo on Twitter, checking out the proposed pre-conference reading list, and looking at the discussion papers submitted. One that stood out to me was Tarleton Gillespie’s “The Relevance of Algorithms,” which connects the idea that algorithms are “objective” to journalists’ conception of the same idea (emphasis all mine):

This assertion of algorithmic objectivity plays in many ways an equivalent role to the norm of objectivity in Western journalism. Like search engines, journalists have developed tactics for determining what is most relevant, how to report it, and how to assure its relevance — a set of practices that are relatively invisible to their audience, a goal that they admit is messier to pursue than they might appear, and a principle that helps set aside but does not eradicate value judgments and personal politics. These institutionalized practices are animated by a conceptual promise that, in the discourse of journalism, is regularly articulated (or overstated) as a kind of totem. Journalists use the norm of objectivity as a “strategic ritual” (Tuchman 1972), to lend public legitimacy to knowledge production tactics that are inherently precarious. “Establishing jurisdiction over the ability to objectively parse reality is a claim to a special kind of authority” (Schudson and Anderson 2009, 96).

Journalist and algorithmic objectivities are by no means the same. Journalistic objectivity depends on an institutional promise of due diligence, built into and conveyed via a set of norms journalists learned in training and on the job; their choices represent a careful expertise backed by a deeply infused, philosophical and professional commitment to set aside their own biases and political beliefs. The promise of the algorithm leans much less on institutional norms and trained expertise, and more on a technologically inflected promise of mechanical neutrality. Whatever choices are made are presented both as distant from the intervention of human hands, and as submerged inside of the cold workings of the machine.

But in both, legitimacy depends on accumulated guidelines for the proceduralization of information selection. The discourses and practices of objectivity have come to serve as a constitutive rule of journalism (Ryfe 2006). Objectivity is part of how journalists understand themselves and what it means to be a journalist. It is part of how their work is evaluated, by editors, colleagues, and their readers. It is a defining signal by which journalists even recognize what counts as journalism. The promise of algorithmic objectivity, too, has been palpably incorporated into the working practices of algorithm providers, constitutively defining the function and purpose of the information service. When Google includes in its “Ten Things We Know to Be True” manifesto that “Our users trust our objectivity and no short-term gain could ever justify breaching that trust,” this is neither spin nor corporate Kool-Aid. It is a deeply ingrained understanding of the public character of Google’s information service, one that both influences and legitimizes many of its technical and commercial undertakings, and helps obscure the messier reality of the service it provides.

The Tuchman reference is to Gaye Tuchman’s 1972 landmark piece “Objectivity as Strategic Ritual: An Examination of Newsmen’s Notions of Objectivity.” The Michael Schudson/C.W. Anderson piece is “Objectivity, Professionalism, and Truth Seeking in Journalism” (2009). The Ryfe is David Ryfe’s “The Nature of News Rules.”

March 29 2013

13:30

February 10 2012

18:00

Still shaping the way people think about news innovation? A few reflections on the new KNC 2.0

As someone who probably has spent more time thinking about the Knight News Challenge than anyone outside of Knight Foundation headquarters — doing a dissertation on the subject will do that to you! — I can’t help but follow its evolution, even after my major research ended in 2010. And evolve it has: from an initial focus on citizen journalism and bloggy kinds of initiatives (all the rage circa 2007, right?) to a later emphasis on business models, visualizations, and data-focused projects (like this one) — among a whole host of other projects including news games, SMS tools for the developing world, crowdsourcing applications, and more.

Now, after five years and $27 million in its first incarnation, Knight News Challenge 2.0 has been announced for 2012, emphasizing speed and agility (three contests a year, eight-week turnarounds on entries) and a new topical focus (the first round is focused on leveraging existing networks). While more information will be coming ahead of the February 27 launch, here are three questions to chew on now.

Does the Knight News Challenge still dominate this space?

The short answer is yes (and I’m not just saying that because, full disclosure, the Knight Foundation is a financial supporter of the Lab). As I’ve argued before, in the news innovation scene, at this crossroads of journalism and technology communities, the KNC has served an agenda-setting kind of function — perhaps not telling news hipsters what to think regarding the future of journalism, but rather telling them what to think about. So while folks might disagree on the Next Big Thing for News, there’s little question that the KNC has helped to shape the substance and culture of the debate and the parameters in which it occurs.

Some evidence for this comes from the contest itself: Whatever theme/trend got funded one year would trigger a wave of repetitive proposals the next. (As Knight said yesterday: “Our concern is that once we describe what we think we might see, we receive proposals crafted to meet our preconception.”)

And yet the longer answer to this question is slightly more nuanced. When the KNC began in 2006, with the first winners named in 2007, it truly was the only game in town — a forum for showing “what news innovation looks like” unlike any other. Nowadays, a flourishing ecosystem of websites (ahem, like this one), aggregators (like MediaGazer), and social media platforms is making the storyline of journalism’s reboot all the more apparent. It’s easier than ever to track who’s trying what, which experiments are working, and so on — and seemingly in real time, as opposed to a once-a-year unveiling. Hence the Knight Foundation’s move to three quick-fire contests a year, “as we try to bring our work closer to Internet speed.”

How should we define the “news” in News Challenge?

One of the striking things I found in my research (discussed in a previous Lab post) was that Knight, in its overall emphasis, has pivoted away from focusing mostly on journalism professionalism (questions like “how do we train/educate better journalists?”) and moved toward a broader concern for “information.” This entails far less regard for who’s doing the creating, filtering, or distributing — rather, it’s more about ensuring that people are informed at the local community level. This shift from journalism to information, reflected in the Knight Foundation’s own transformation and its efforts to shape the field, can be seen, perhaps, like worrying less about doctors (the means) and more about public health (the ends) — even if this pursuit of health outcomes sometimes sidesteps doctors and traditional medicine along the way.

This is not to say that Knight doesn’t care about journalism. Not at all. It still pours millions upon millions of dollars into clearly “newsy” projects — including investigative reporting, the grist of shoe-leather journalism. Rather, this is about Knight trying to rejigger the boundaries of journalism: opening them up to let other fields, actors, and ideas inside.

So, how should you define “news” in your application? My suggestion: broadly.

What will be the defining ethos of KNC 2.0?

This is the big, open, and most interesting question to me. My research on the first two years of KNC 1.0, using a regression analysis, found that contest submissions emphasizing participation and distributed knowledge (like crowdsourcing) were more likely to advance, all things being equal. My followup interviews with KNC winners confirmed this widely shared desire for participation — a feeling that the news process not only could be shared with users, but in fact should be.

I called this an “ethic of participation,” a founding doctrine of news innovation that challenges journalism’s traditional norm of professional control. But perhaps, to some extent, that was a function of the times, during the roughly 2007-2010 heyday of citizen media, with the attendant buzz around user-generated content as the hot early-adopter thing in news — even if news organizations then, as now, struggled to reconcile and incorporate a participatory audience. Even while participation has become more mainstream in journalism, there are still frequent flare-ups, like this week’s flap over breaking news on Twitter, revealing enduring tensions at the “collision of two worlds — when a hierarchical media system in the hands of the few collides with a networked media system open to all,” as Alfred Hermida wrote.

So what about this time around? Perhaps KNC 2.0 will have an underlying emphasis on Big Data, algorithms, news apps, and other things bubbling up at the growing intersection of computer science and journalism. It’s true that Knight is already underwriting a significant push in this area through the (also just-revised) Knight-Mozilla OpenNews project (formerly called the Knight-Mozilla News Technology Partnership — which Nikki Usher and I have written about for the Lab). To what extent is there overlap or synergy here? OpenNews, for 2012, is trying to build on the burgeoning “community around code” in journalism — leveraging the momentum of Hacks/Hackers, NICAR, and ONA with hackfests, code-swapping, and online learning. KNC 2.0, meanwhile, talks about embracing The Hacker Way described by Mark Zuckerberg — but at the same time backs away a bit from its previous emphasis on open source as a prerequisite. It’ll be interesting to see how computational journalism — explained well in this forthcoming paper (PDF here) by Terry Flew et al. in Journalism Practice — figures into KNC 2.0.

Regardless, the Knight News Challenge is worth watching for what it reveals about the way people — journalists and technologists, organizations and individuals, everybody working in this space — talk about and make sense of “news innovation”: what it means, where it’s taking us, and why that matters for the future of journalism.

January 12 2012

19:30

What would a Google News Plus Your World look like?

How soon until we get a Google News “Plus Your World?”

With the introduction of the oddly extraterrestrial-sounding Google “Search Plus Your World”, the company proclaimed their latest experiment is “transforming Google into a search engine that understands not only content, but also people and relationships.” The world of search gets split up and re-filtered through the things Google knows you like and information from the people around you.

Now, since the announcement there’s been much controversy over how “the things Google knows you like” seems to be driven almost entirely by Google+, not larger competitors Facebook and Twitter, which has led to cries for an antitrust inquiry. But let’s set that aside for a moment and think about what the underlying idea of Search Plus Your World could mean for Google News.

Here’s what we know of how Search Plus Your World works:

1. Personal Results, which enable you to find information just for you, such as Google+ photos and posts—both your own and those shared specifically with you, that only you will be able to see on your results page;

2. Profiles in Search, both in autocomplete and results, which enable you to immediately find people you’re close to or might be interested in following; and,

3. People and Pages, which help you find people profiles and Google+ pages related to a specific topic or area of interest, and enable you to follow them with just a few clicks. Because behind most every query is a community.

Let’s say you were interested in Republican frontrunner Mitt Romney. Taking the normal route through Google News, you’d most likely try searching for “mitt romney” or click on the newly added Elections header and get links to stories from usual suspects like USA Today, The Wall Street Journal, The New York Times, and more.

If you dropped a “Search Plus” layer to that, what might you expect? If your Google+ friends are sharing stories about Romney, they might get shoved to the top. That would likely mean they’d be more closely aligned to your friends’ political perspectives; your liberal friends are probably sharing anti-Romney pieces, your conservative ones pro-Romney ones. (Well, unless they’re conservatives who don’t like Romney, but that’s another issue.) You might also see individual Google+ pages from Romney, other G.O.P. candidates — and maybe even the G+ pages of individual reporters who are covering the campaign and writing some of the stories you’re being shown.

That probably sounds a little familiar to what we’ve come accustom to seeing on Twitter and Facebook. But neither of those sites is fueled by the same combination of search and network. Twitter search looks at everybody’s tweets; Facebook’s news feed just flows on by, driven by Facebook’s algorithms rather than your search interests. Google has the search DNA, the news-crawling infrastructure, and at least the start of the network knowledge that could combine to make something new.

Those are pretty powerful assets and remain the reason Google still inspires animosity in some news executives and antitrust lawyers. A Google News Plus Your World (let’s call it “Google News+” for the sake of brevity) could in theory provide users a social news service that (whether they like it or not) knows about their browsing habits and social graph while also filtering “news” product out of the larger mass of the web.

Google News already provides tools for customization, but most (though not all) rely on users’ being interested in fiddling — yes to business, no to entertainment, more Wall Street Journal, less Cat Fancy. The genius of the social news feed is that it bases those decisions off of network data. You just need to build a network.

But let’s stretch the speculation just a bit further: the toggle. One of the more clever aspects of Search Plus is the ability to shift back and forth between normal and personalized search with one little switch. The toggle is a way to dodge the backlash that comes when any product is seemingly irreversibly redesigned (and built on seemingly self-serving decisions of the sort that gets lawyers involved).

For a notional Google News+, that little switch would be a dividing line for readers between the fire hose and a curated feed. It could also be, as Steven Levy points out over at Wired, the pin that pops Eli Pariser’s Filter Bubble. That kind of freedom, to jump back and forth between the personalized feed and the raw stream, is not common on news sites. The toggle is a promise of serendipity, but also the comfort of your favorite, trusted news service, which, if you are someone still mourning the transformation of Google Reader, could harken back to glory days of the RSS service.

One bit of groundwork is already laid for something like Google News+: the integration of journalists Google+ profiles into Google News. Connecting authorship to identity is an ongoing Google interest, beginning with the authorship markup language that spotlighted writers in search results. As Google tries to integrate social into every corner of their business, a fully social-fied Google News would seem like a logical next step.

January 10 2012

15:20

The Top 10 Data-Mining Links of 2011

Overview is a project to create an open-source document-mining system for investigative journalists and other curious people. We've written before about the goals of the project, and we're developing some new technology, but mostly we're stealing it from other fields.

overview.png

The following are some of the best ideas we saw in 2011, the data-mining work that we found most inspirational. Many of these links are educational resources for learning about specific technology. Some of this work illuminates how algorithms and humans treat information differently. Other are just amazing, mind-bending work.

1. What do your connections say about you? A lot. It is possible to accurately predict your political orientation solely on the basis of your network on Twitter. You can also work out gender and other things from public information.

2. Free textbooks from Stanford University. "Introduction to Information Retrieval" teaches you how a search engine works, in great detail. "Mining Massive Data Sets" covers a variety of big-data principles that apply to different types of information.

3. We're not above having a list of lists. Here's the Data Mining Blog's top 5 articles. Most of these are foundational, covering basic philosophy and technique such as choosing variables, finding clusters, and deciding what you're looking for.

4. The MINE technique looks for patterns between hundreds or thousands of variables -- say, patterns of gene expression inside a single cell. It's very general, and finds not only individual relationships but networks of cause and effect. Here's a nifty video, here's the original paper, and here's one statistician's review.

5. This is one of those papers that really changed the way I look at things. How do we know when a data visualization shows us something that is "actually there," as opposed to an artifact of the numbers? "Graphical Inference for Infovis" provides one excellent answer, based on a clever analogy with numerical statistics.

6. Lots of text-mining work uses "clustering" or "classification" techniques to sort documents into topics. But doesn't a categorization algorithm impose its own preconceptions? This is a deep issue, which you might think of as "framing" in code. To explore this question Justin Grimmer and Gary King went meta with a system that visualizes all possible categorizations of a document set, and how they relate.

7. A few years ago Google showed that the number of searches for "flu" was a great predictor of the actual number of outbreaks in a given location -- faster and more specific than the Center for Disease Control's own surveillance data. The team has now expanded the technique into Google Correlate, which instantly scans through petabytes of data to find search terms which follow any user-supplied time series. Here's New Scientist taking it for a test drive.

stanford.png

8. Not content with free professional textbooks, Stanford has created two free online courses for machine learning and natural language processing. Both are live-streamed lecture series taught by experts, with homework. Learning these intricate technologies has never been easier.

9. Lots of people have speculated about the role of social media in protest movements. A team of researchers looked at the data, analyzing a huge set of tweets from the "May 20" protests in Spain last year. How do protests spread from social media? Now we have at least one solid answer.

10. And the craziest data-mining link we ran across in 2011: IBM's DeepQA project, which beat human Jeopardy champions. This project looks into an unstructured database to correctly answer about 80% of all general questions posed to it, in just a few seconds. Here's a TED talk, and here's the technical paper that explains how it works. I can't tell you how badly I want one of these in the newsroom. If enough journalist hackers build on each other's work, maybe one day ...

Happy data mining! We'll be releasing our own prototype document-mining system, and the source, at the NICAR conference next month. If these are the sorts of algorithms you like to play with, we're also hiring programmers who want to bring these sorts of advanced techniques within everyone's reach.

October 13 2010

16:00

From tornadoes to Noxema: Hany Farid on using digital forensics to assess the authenticity of photos

Late last month, when tornado-like, end-times-are-nigh-style winds sliced through New York City, Time magazine posted to its NewsFeed an image of a twister forming in the waters beyond the Statue of Liberty — menacing, dark, grainy. The story the mag published — “Gotham Tornado: Amazing Photo of Twister Passing Statue of Liberty” — let the image in question pretty much speak for itself, with the only text accompanying the photo being its headline and twelve SEO-friendly tags. Turns out, though, that the graininess of the image was a symptom not of camera-phone authenticity, but of old age: The photo was shot in 1976.

Time’s mistake — the overzealous posting of an image that, given its context, would seem to be authentic — is one that almost any news organization could make. How, after all, do we check the accuracy of news images, particularly in moments of breaking-news urgency? Though we’re seeing a blossoming of fact-checking in text-based journalism, we have yet to see an equivalent movement in image-based news reporting — mostly, of course, because we lack good tools for determining whether images are authentic or manipulated, whether they depict what they claim to or something else entirely. Which is disturbing, given that we look to images — the raw material of the world, supposedly filtered only through a camera’s lens — to give us an unvarnished view of human events.

Enter Hany Farid. A computer science professor at Dartmouth, Farid is a pioneer in the field of digital forensics, figuring out how to analyze images to determine their authenticity. (Think CSI: Photojournalism.) In his day-to-day work, Farid deals with the algorithmic aspects of photo manipulation — how to translate light and shadow, for example, into data sets that will detect whether a particular image could actually exist in the real world.

“The issues of photo manipulation are going well beyond just the technical, mathematical, geek stuff,” Farid notes. “We’re struggling as a society to deal with what happens when this thing that we have learned to trust over so many years” — the photographic image — “becomes incredibly untrustworthy. And that, to me, is extremely interesting.”

The biggest element of mistrust is the increasing prevalence of manipulation, via Photoshop and other tools. While, most often, those programs are used to create basic composites, or ironically derivative images (cf: Rebecca Gayheart’s Noxemas, courtesy of Gawker), more and more, they’re also being used to produce composites designed to mislead the viewer. Take a tabloid photo of Angelina Jolie and Brad Pitt on a beach, which seems, at first, true to its caption: “CAUGHT TOGETHER!” Study the image closely, though — actually analyze it — and it becomes apparent that the photo is doctored: Its stars are lit from opposite sides. The image is a combination of two separate photos of the couple merged together through the magic of image-layering. (See more about this photo here.)

One problem is that our brains simply don’t seem to be wired for the kind of bit-by-bit observation such analysis requires. On the contrary: When it comes to images, our minds — which tend to interpret images as singular units, rather than composites of atomic ones — can often abet manipulation. “Your brain is remarkable, but it has some pretty serious limitations,” Farid says. And one of those limitations is image assessment. On sabbatical last year in Berkeley, he worked with neuroscientists and vision scientists, studying the limits of the visual system. In general, “people tend to over-trust their eye…and that’s a very dangerous game to be playing.”

Now, photo-tampering is becoming so prevalent, Farid notes, that mistrust of images is slowly becoming our default. “There’s almost a backlash,” he says, “and now there’s this over-skepticism of everything out there. It’s amazing.”

Digital forensics — shifting the analysis of images from an art to a science — is one way for images to earn back our trust. Forensic techniques “can help bring some sanity to this.” And the field “just continues to get more and more sophisticated,” Farid notes. “We’re able to do things today that a few years ago seemed unimaginable. And a few years from now, we’ll do something that, today, seems unimaginable.” Farid and his team have been developing software that can be used — by news organizations, in particular — to analyze the authenticity of images before publishing them. It’ll be a few years before that software is ready to be used, he notes; but “we are just getting to the stage where, I think, commercialization is viable.”

That’s a good thing, because the need for rigorous analysis of images is, and will continue to be, increasingly urgent. “It used to be, if you had a handful of photojournalists around the country, you could have some kind of quality control,” Farid points out. But now — “when everybody’s got a cell phone with video and images and they’re posting on their blogs, and to Twitter and YouTube” — the ethics of photojournalism are being tested and shifted. An analytic platform could bring a sense of universality back to those principles. “The issue with photo manipulation is that it’s not black-and-white,” Farid notes. “There are certain types of photo manipulation which are completely acceptable — and there are other ones that are completely unacceptable.” Images have always held a powerful place in journalism, of course, and “you’ve always been able to editorialize with photographs.” But now, Farid says, “it’s a question of degree.” Now, using images, “you can really change the entire story. And that, obviously, is a very different beast.”

September 14 2010

18:30

“Squeezing humanity through a straw”: The long-term consequences of using metrics in journalism

[Here's C.W. Anderson responding to the same subject Nikki Usher wrote about: the impact of audience data on how news organizations operate. Sort of a debate. —Josh]

One way to think about the growing use of online metrics in newsrooms (a practice that has been going on forever but seems to have finally been noticed of late) is to think about it as part of a general democratization of journalism. And it’s tempting to portray the two sides to the debate as (in this corner!) the young, tech-savvy newsroom manager who is finally listening to the audience, and (in this corner!) the fading fuddy-duddy-cum-elitist more concerned with outdated professional snobbery than with what the audience wants.

Fortunately, actual working journalists rarely truck in such simplistic stereotypes, arguing rightly that there isn’t a binary divide between smart measurement and good journalism. As Washington Post executive producer and head of digital news products Katharine Zaleski told Howard Kurtz:

There’s news we know people should read — because it’s important and originates with our reporting — and that’s our primary function…But we also have to be very aware of what people are searching for out there and want more information on…If we’re not doing that, we’re not doing our jobs.

Or as Lab contributor Nikki Usher put it: “[I]f used properly, SEO and audience tracking make newsrooms more accountable to their readers without dictating bad content decisions — and it can help newsrooms focus on reader needs.”

At the level of short-term newsroom practices, I agree with Usher, Zaleski, and every other journalist and pundit who takes a nuanced view of the role played by newsroom metrics. So if you’re worried about whether audience tracking is going to eliminate quality journalism, the quick answer is no.

My own concerns with the increased organizational reliance on metrics are more long-term and abstract. They have as much to do with society than with journalism per se. They center around:

— the manner in which metrics can serve as a form of newsroom discipline;
— the squishiness of algorithmically-afforded audience understanding;
— the often-oversimpistic ways we talk about the audience (under the assumption that we’re all talking about the same thing); and, finally
— the way that online quantification simplifies our understanding of what it means to “want” information.

Big topics, I admit. Each of these points could be the subject of its own blog post, so for the sake of space, I want to frame what I’m talking about by dissecting this seemingly innocuous phrase:

“We know what the audience wants.”

Let’s look at the words in this sentence, one at a time. Each of them bundles in a lot of assumptions, which, when examined together, might shed light on the uses and the potential long-term pitfalls of newsroom quantification.

“We”: Who is the “we” that knows what kind of journalism the audience wants? Often, I’d argue, it’s executives in our increasingly digitized newsrooms that now have a powerful tool through which to manage and discipline their employees. In my own research, I’ve discovered that the biggest factor in determining the relationship between metrics and editorial practices are the ways that these metrics are utilized by management, rather the presence or absence of a particular technology. Philosopher Michel Foucault called these types of practices disciplinary practices, and argued that they involved three primary types of control: “hierarchical observation, normalizing judgment, and the examination.” Perhaps this is fine when we’re trying to salvage a functional news industry out of the wreckage of a failed business model, but we should at least keep these complications in mind — metrics are a newsroom enforcement mechanism.

“Know”: Actually, we don’t know a whole lot about our audiences — but there’s a lot of power in claiming that we know everything. In other words, the more data we have, paradoxically, the less we know, and the more it behooves us to claim exactitude. While smart thinkers have been writing about the problem of poor web metrics for years, a major new report by the Tow Center for Digital Journalism at Columbia has thrown the issue into stark relief. As report researcher (and, full disclosure, friend and colleague) Lucas Graves writes:

The Web has been hailed as the most measurable medium ever, and it lives up to the hype. The mistake was to assume that everyone measuring everything would produce clarity. On the contrary, clear media standards emerge where there’s a shortage of real data about audiences…The only way to imbue an audience number with anything like the authority of the old TV ratings is with a new monopoly — if either Nielsen or comScore folds or, more likely, they merge. That kind of authority won’t mean greater accuracy, just less argument.

There’s a circular relationship here between increased measurement, less meaningful knowledge, and greater institutional power. When we forget this, we can be uncritical about what it is metrics actually allow us to do.

“The Audience”: What’s this thing we insist we know so much about? We call it the audience, but sometimes we slip and call it “the public.” But audiences are not publics, and it’s dangerous to claim that they are. Groups of people connected by the media can be connected in all sorts of ways, for all sorts of reasons, and can be called all sorts of things; they can be citizens united by common purpose, or by public deliberation. They can be activists, united around a shared political goal. They can be a community, or a society. Or they can be called an audience.

I don’t have anything at all against the notion of the audience, per se — but I am concerned that journalists are increasingly equating the measurable audience (a statistical aggregate connected by technology, though consumption) with something bigger and more important. The fact that we know the desires and preferences and this formerly shadowy and hidden group of strangers is seductive, and it’s often wrong.

“Wants”: Finally, what does it mean to want a particular piece of information? As Alexis Madrigal notes in this short but smart post at The Atlantic, informational want is a complicated emotion that runs the risk of being oversimplified by algorithms. Paradoxically, web metrics have become increasingly complex at the same time they’ve posited increasingly simplistic outcomes. They’re complex in terms of their techniques, but simple in terms of what it is we claim they provide us and in the ultimate goal that they serve. Time on site, engagement, pageviews, uniques, eye movement, mouse movement — all of these ultimately boil down to tracking a base-level consumer desire via the click of a mouse or the movement of the eye.

But what do we “want”? We want to love a story, to be angry about it it, to fight with it, to be politically engaged by it, to feel politically apathetic towards it, to let it join us together in a common cause, for it make us laugh, and for it to make us cry. All of these wants are hard to capture quantitatively, and in our rush to capture audience data, we run the risk of oversimplifying the notion of informational desire. We run the risk of squeezing humanity through a digital straw.

So — will an increasing use of online metrics give us bad journalism? No.

Will they play a role in facilitating, over the long term, the emergence of a communicative world that is a little flatter, a little more squeezed, a little more quantitative, more disciplinary, more predictive, and less interesting? They might. But take hope: Such an outcome is likely only if we lose sight of what it is that metrics can do, and what it is about human beings that they leave out.

August 25 2010

13:30

Googling serendipity: How does journalism fare in a world where algorithms trump messy chance?

Twelve years ago, when I was reporting on the pending Microsoft antitrust case, I learned that what was really at stake wasn’t immediately apparent in the legal briefs. It wasn’t the browser market (remember Netscape?) or whether Windows should be able to run somebody else’s word-processing program. Rather, it was how control was exercised over the places where we learned, created, and engaged in critical thought.

One of the best thinkers on the topic was Ben Shneiderman, founding director of the Human-Computer Interaction Lab at the University of Maryland. He told me at the time that the critical question for Microsoft was not whether the company encouraged innovation — it did — but rather how financial pressures dictated which innovations it adopted and which it let wither. The Microsoft software suite, he noted, wasn’t very accessible to people with learning disabilities or those with low incomes.

Fast forward to 2010, and now we hear from Eric Schmidt, CEO of Google, another powerful technology company that controls the tools of creativity and expression. Schmidt recently talked to The Wall Street Journal about the potential for applying artificial intelligence to search, suggesting that the search engine of the future would figure out what we meant rather than find what we actually typed.

Schmidt seems to be pushing the idea that the future — or, more accurately, each of our individual futures, interests, and passions — all can be plotted by algorithm from now until our dying day. The role of serendipity in our lives, he said, “can be calculated now. We can actually produce it electronically.”

Really?

According to Webster’s, serendipity is “the faculty or phenomenon of finding valuable or agreeable things not sought for.” So if the essence of serendipity is chance or fortune or chaos, then by definition, anything that a search engine brings to you, even on spec, isn’t serendipitous.

I don’t know whether Schmidt’s comments should be chalked up to blind ambition or to quant-nerd naivete. But it’s troubling that Schmidt seems to discount the role that human nature plays in our everyday lives and, ultimately, in guiding our relationships with technology.

It might be that Schmidt’s vision for the search engine of the future would serve us well in finding a new restaurant, movie or book. But if Google really wants to take the guesswork out of our lives, we should be asking the same question that Shneiderman put to Microsoft. How might financial pressures shape Google’s “serendipity algorithm”? What content — journalism and otherwise — will it push our way that will shape our worldview? And, to Shneiderman’s point, what limits does it impose?

I think it’s safe to say that some good ideas don’t lend themselves to being monetized online — witness the rise of nonprofit startups in bringing us investigative, public affairs, and explanatory journalism. How might they fare in Schmidt’s world order?

I caught up with Shneiderman on Monday, and he agreed that this is one of the key questions that should be debated as we depend more and more on a “recommender system” in which companies like Google or Amazon use massive databases to anticipate our needs and wants. Public interest groups and other nonprofits that can’t afford the right keywords could be most vulnerable in these systems, Shneiderman said. “How far down the list do the concerns of civic groups get pushed?” he asked.

It’s fair to ask companies what considerations and factors might be weighted in their search formulas, Shneiderman said, but it isn’t clear what level of transparency should be expected. “What is a reasonable a request to make without exposing their algorithm and their business practices?” he said.

I can’t say either. But I do think there are some lessons that Google can take from the history that Microsoft has helped write.

One lesson is that what’s good for the bottom line doesn’t always jibe with what’s best for consumers. A dozen years ago, the Netscape browser was regarded by many as more as more functional, but Microsoft saw it as a threat. So it bundled its own Internet Explorer browser in its operating system and effectively pushed Netscape out of existence.

Another lesson is that it isn’t always possible to divine what people will want in the future based on a profile of what they (or people like them) have wanted it the past. Indeed, some of the most successful technology companies — Google included — have succeeded precisely because their vision for the future was radical, new and compelling. Microsoft once played that role to a monolithic IBM. But today, as Microsoft’s market valuation has been eclipsed by that of Apple, it has become debatable whether Microsoft remains a consumer-driven company.

None of this should be interpreted as an anti-capitalistic rant. We’re all better off for Google’s search box, and it’ll be interesting to see where Schmidt’s vision takes the company.

Rather, it is a suggestion that even the most elaborate algorithms and high-touch e-marketing can’t address every human need.

One of the best vacations I ever took was when I pulled out of my driveway in Raleigh in late August 1991 with no particular destination. Two days later, I found myself in North Dakota, discovering places I never would have appreciated based on my past interests or those of my friends and peers. The experience was so compelling to me precisely because it was serendipitous.

That trip has served as an important reminder to me ever since. When we don’t know what we want, sometimes what we really need is to figure it out for ourselves.

July 27 2010

16:30

When do 92,000 documents trump an off-the-record dinner? A few more thoughts about Wikileaks

Sometimes you can spend an entire morning racing the clock to put together the perfect blog post, and once you’re done, find a quote or two that would have let you sum up the entire thing in a lot less time. Such is the case with this great exchange between veteran reporter Tom Ricks (now blogging at Foreign Policy magazine) and David Corn at Mother Jones. Ricks pretty much trashed the “War Logs“/Wikileaks story that has been the buzz of the journalism world for the past few days, and dropped this gem:

A huge leak of U.S. reports and this is all they get? I know of more stuff leaked at one good dinner on background.

David Corn responded with a thoughtful post that is worth reading in full. The essence of it, however, is this:

These documents — snapshots from a far-away war — show the ground truth of Afghanistan. This is not what Americans receive from US officials. And with much establishment media unable (or unwilling) to apply resources to comprehensive coverage of the war, the public doesn’t see many snapshots like these. Any information that illuminates the realities of Afghanistan is valuable.

This captures the essence of the question I was trying to get at in the fifth point of yesterday’s post (“journalism in the era of big data”). I noted the similarities between “War Logs” and last week’s big bombshell, “Top Secret America.” The essence of the similarity, I said, was that they were based on reams of data, which, in sum, might not tell us anything shockingly new but that brought home, in Ryan Sholin’s excellent phrase, “the weight of failure.” And this gets me excited because I think it represents something new in journalism, or something old-enough-to-new: a focus on the aggregation of a million “on the ground reports” that might sometimes get us closer to the truth than three well placed sources over a nice off-the-record dinner. And I’m fascinated by this because this is the way that I, as a qualitative social scientist, have always seen as a particularly valid way to learn about the world.

Ricks’ quote, on the other hand, captures a certain strain of more traditional thinking: the point of journalism is to learn something shockingly new, hopefully from those elites in a position to really know what’s going on. Your job, as a journalist, is to get close enough to those elites so that they’ll tell you what’s really going on (a “nice” dinner, now, not just any old dinner!), and your skill as a journalist lies in your ability to hone your bullshit detector so that you can separate the self-serving goals of your sources from “the truth.” Occasionally, those elites will drop a big stack of documents on your desk, but that’s a rare occurrence.

I want to be clear: I don’t think one “new” type of journalism is going to displace the traditional way. Obviously, both journalistic forms will work together in tandem; indeed, it seems like most of what The New York Times did with “War Logs” was to run the data dump by its network of more elite sources for verification and context. But we are looking at something different here, and I think the Ricks-Corn exchange captures an important tension at the heart of this transition.

To conclude, two more reading links for you. In the first, “A Speculative Post on the Idea of Algorithmic Authority,” Clay Shirky wrote late last year that the authority system he sees emerging in a Google-dominated world values crap as much as it does quality.

Algorithmic authority is the decision to regard as authoritative an unmanaged process of extracting value from diverse, untrustworthy sources, without any human standing beside the result saying “Trust this because you trust me.”

This notion gets at the fact that a lot of the documents contained in the “War Logs” trove might have been biased, or partial, or flat-out wrong. But it doesn’t matter, Shirky might argue, in the same way that it might in the world that Ricks describes — a world where, in Shirky’s terms, an elite source is “standing beside the result saying ‘Trust this because you trust me.’”

The second link is a little more obscure. In her book How We Became Posthuman, N. Katherine Hayles argues that one of the major consequences of digitization is that we, as an informational culture, no longer focus as much on the distinction between presence and absence (“being there,” or not “being there”) as we do on the difference between pattern and randomness. In other words, “finding something new” (being there, being at dinner, getting the source to say something we didn’t know before) may not always be as important as finding the pattern in what is there already.

This is a deep point, and I can’t go into it much more in this post. But I’m thinking a lot about it these days as I ponder new forms of online journalism, and I’ll probably write about it more in the months and years ahead.

June 17 2010

14:00

“A super sophisticated mashup”: The semantic web’s promise and peril

[Our sister publication Nieman Reports is out with its latest issue, and its focus is the new digital landscape of journalism. There are lots of interesting articles, and we're highlighting a few. Here, former Knight Fellow Andrew Finlayson explains the role of journalists in the semantic web. —Josh]

In the movie Terminator, humanity started down the path to destruction when a supercomputer called Skynet started to become smarter on its own. I was reminded of that possibility during my research about the semantic web.

Never heard of the semantic web? I don’t blame you. Much of it is still in the lab, the plaything of academics and computer scientists. To hear some of them debate it, the semantic web will evolve, like Skynet, into an all powerful thing that can help us understand our world or create various crises when it starts to develop a form of connected intelligence.

Intrigued? I was. Particularly when I asked computer scientists about how this concept could change journalism in the next five years. The true believers say the semantic web could help journalists report complex ever-changing stories and reach new audiences. The critics doubt the semantic web will be anything but a high-tech fantasy. But even some of the doubters are willing to speculate that computers using pieces of the semantic Web will increasingly report much of the news in the not too distant future.

Keep reading at Nieman Reports »

June 10 2010

22:23

Google News experiments with human control, promotes a new serendipity with Editors’ Picks

Late this afternoon, Google News rolled out a new experiment: Editors’ Picks. Starting today, a small percentage of Google News users will find a new box of content with that label, curated not by Google’s news algorithm, but by real live human news editors at partner news organizations. Here’s an example, curated by the editors of Slate:

Per Google’s official statement on the new feature:

At Google, we run anywhere from 50 to 200 experiments at any given time on our websites all over the world. Right now, we are running a very small experiment in Google News called Editors’ Picks. For this limited test, we’re allowing a small set of publishers to promote their original news articles through the Editors’ Picks section.

That by itself is a remarkable shift for a website that, at its launch in 2002, proudly included on every page: “This page was generated entirely by computer algorithms without human editors. No humans were harmed or even used in the creation of this page.

But Google’s statement very much understates the feature’s (potential) significance. You know how Cass Sunstein wanted to build an “architecture of serendipity” that would give readers important but surprising information? And how, increasingly, many news thinkers have come to believe that systematizing serendipity is not so much a contradiction as a democratic necessity? Well, this is a step — small, but certain — in that direction. Think of Editors’ Picks as a Spotlight-like feature that, instead of highlighting “in-depth pieces of lasting value,” shines a light on what editors themselves have deemed valuable. 

In that sense, Editors’ Picks — currently being run in partnership with less than a dozen news outlets, including The Washington Post, Newsday, Reuters, and Slate — could recreate the didn’t-know-you’d-love-it-til-you-loved-it experience of the bundled news product within the broader presentation of Google News’ algorithmically curated news items. Serendipity concerns exist even at Google (see Fast Flip, for example); this is one way of replicating the offline experience of serendipity-via-bundling within the sometimes scattered experience of online news consumption.

Editors’ Picks also does what its name suggests: it allows editors to choose which stories they introduce to the Google News audience. (Google confirmed to me the links on display aren’t being paid for by the news publishers — that is, it’s not a sponsored section.) Publishers can choose to promote stories that have done well, traffic-wise, amplifying that success — or they can choose to promote stories that have gotten less traction. Or they can simply choose to promote stories that are funny or important or touching or all of the above — stories that are simply worth reading. The point is, they can choose.

Which is, of course, of a piece with Google’s renewed focus on the news side of its search functionalities — and its effort to reach out to the news organizations. And it’s of a piece with other sites that have moved from automated news to automation-plus-human-editing.

Consumers, for their part, get some choice in the matter, as well: The Editors’ Picks experiment combines crowd-curated content with content selected by news organizations themselves — editorial authority and algorithmic — within the same news presentation.

In other words: serendipity, systematized.

December 11 2009

10:39

PDA: ‘Algorithms will replace journalists’

Heavy-going for a Friday morning, but an interesting read: PDA has an interciew with Frank Schirrmacher, publisher of German newspaper Frankfurter Allgemeine Zeitung, in which he says:

“With the internet, we are experiencing the industrialisation of information and communication. Algorithms are used more and more to produce information that used to be created by journalists, or humans in general. More and more of these algorithms are being used to find out what people are thinking.

(…) The path we face in journalism is one in which there are fewer humans and more machines – and if you look at all the inaccurate news reports already, that is grotesque.”

Investing in ‘the human factor’ – and therefore not succumbing entirely to algorithms – will mark out the successful media companies of the future, argues Schirrmacher.

Full interview at this link…

Similar Posts:



Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl