Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 01 2013

14:18

Not an April Fool’s joke: The New York Times has built a haiku bot

timeshaiku2

New York Times senior software architect Jacob Harris has a thing for robots and wordplay. You may recall he’s the guy behind @nytimes_ebooks, the Times answer to the elusive and inscrutable Twitter bot @Horse_ebooks.

So it’s only natural that Harris has now created an algorithm that extrudes haiku out of the text of Times stories. In other words:

Haiku harvester
built inside The New York Times —
does it have a soul?

(If my eighth grade English teacher is reading this. Sorry.)

Here’s a better, more Times-y example:

timeshaiku1

Times Haiku is a collection of what they are calling “serendipitous poetry,” derived from stories that have made the homepage of NYTimes.com. The haiku live on a Tumblr hosted by the Times. Harris built a script that mines stories for haiku-friendly words and then reassembles them into poetry. (For those of you that may have zoned out in class, haiku are comprised of three lines with, in order, five, seven, and five syllables.) The code checks words against an open source pronunciation dictionary, which handily also contains syllable counts.

“Sometimes it can be an ordinary sentence in context, but pulled out of context it has a strange comedy or beauty to it,” Harris said.

Harris was inspired by Haikuleaks, a similar project that found poetry in the cache of diplomatic cables released by WikiLeaks in 2010. The backbone of that project was an open source program called Haiku Finder, which crawls through text to generate haiku. The program was built in Python; Harris made his own version in Ruby on Rails.

The result, much like @nytimes_ebooks, is bizarre, quirky, and kind of zen. The haiku have a strange way of getting at the heart of a story, or teasing out interesting fragments from an article. “There’s something appealing about finding these snippets of text, these turns of phrase and pulling them out,” Harris said. “You find it compelling and it drives you to read the article that it came from.” (Think of it as a more lexicographically strict version of Paul Ford’s SavePublishing.)

In its own poetic way, Times Haiku will be another access point for Times stories, said Marc Lavallee, assistant editor for interactive gnus at the Times. “If someone sees the site, or the image of an individual haiku and shares it on Tumblr, and it gets them to think about who we are and what we do, or gives them a moment of pause, I think we’ve succeeded in a way,” Lavallee said.

Lexi Mainland, social media editor for the Times, said they wanted the poems to be able to stand on their own and be readily sharable. That’s why the haiku are actually images, which fits well with the aesthetic of Tumblr, she said. Outside of Tumblr, the Times will promote the haiku through the paper’s flagship Twitter account.

That the Times has the ability to build a haiku bot isn’t surprising. But why build a haiku bot? “A lot of the projects we work on here are these incredibly big heaves, which are very, very gratifying,” said Mainland. “But you crave these smaller projects, which are just as valuable.” Similarly, projects like the haiku bot may seem silly on the surface, but the underlying code, the use of natural language processing, or other components could be valuable to future projects, Lavallee said.

It helps that the project came at little expense to the Times — Harris put it together on his own during a fit of post-election letdown. Harris had been working on projects connected to the presidential race for over a year, and after election day suddenly found himself with idle hands. He wrote the code in November and began monitoring what it was spitting out. After showing it to Mainland, Lavallee, and other editors, they gave the project a green light. Designer Heena Ko and software developer Anjali Bhojani gave the haiku their distinctive appearance for Tumblr. (Those lines you see running askew of the text of the haiku? The length is computer generated, based on the meter of the first line of text.)

As whimsical as a haiku bot or a spammy-sounding Twitter bot might be, both are efforts to find new uses for the Times’ vast collection of work. “It’s just this large corpus of text that gets very dizzing to look through,” Harris said.

The Times may also have a soft spot for artwork inspired by the written word. Anyone who has visited the lobby of The New York Times Building has likely seen Moveable Type, an algorithm-backed art installation that displays fragments of Times content across 560 display screens.

But why poetry? For starters, today is the first day of National Poetry Month, Mainland said. (Today is also April Fool’s — and if you were wondering, this is not a joke.) Still, for lovers of verse, it may sound like a cold and bloodless way to create poetry. Can you really create poetry without a soul? Do robots have feelings? Can they really see a sunset, or be moved by the sounds of a whale songs CD?

Harris admits the bot is imperfect; it’s required a little teaching along the way. One reason he limited the scope to the front page was because it provides an editor-picked selection, which tends to be richer features and important daily fare. (Running the bot on the Times Wire, Harris said he often got haiku made up of basketball scores, which may be too esoteric for any lit major or stat nerd.) The algorithm is designed to toss haiku with certain sentence constructions (sentences that start with a preposition, for instance) or from sensitive stories. Mainland, Lavallee, and Harris also keep an eye on the haiku being created to see if anything untoward sneaks through.

But Harris also has to do some syllable counting himself, teaching the bot words that appear in the Times (“Rihanna,” for instance) that it doesn’t know. Henry Higgins would be proud.

January 19 2010

20:00

The Search for a New Revenue Model in Journalism

My writing on PBS Idea Lab was introduced to me as a way to publicly discuss the growth of Spot.Us, my Knight News Challenge project. I've received kudos for being honest in my blog posts. I'm comfortable talking about where Spot.Us is falling short, and where we are exceeding expectations. I think we are doing a bit of both -- and trying to adjust to succeed more and fall short less. Hey, that's the nature of iterative projects, which I've always said needs to be at the heart of Spot.Us as a new concept.

So let's keep that bit of honesty alive in this post in order to talk broadly about journalism. (If you just want the updates on Spot.Us, scroll down to the bottom.)

Robert Niles at OJR wrote two recent, fantastic pieces. In the shadow of The Economist's article proclaiming this to be "The year of the pay wall," Niles wrote "there is no revenue model for journalism" and that "doing journalism is an act of community organizing."

Doing journalism as an act of community organizing is something I've been writing/thinking about for a long time -- ever since Assignment Zero first failed. (Its failure only becomes more beautiful and poetic with hindsight). But I want to focus on Niles' first point.

"There is no revenue model for journalism."

That's not an easy thing to say. Probably not good cocktail conversation at a journalism mixer. But let's entertain Niles for a minute.

He says there are three main ways publishers can make money.

  1. Direct purchases, such as subscriptions (or pay walls), copy sales, and tickets
  2. Advertising
  3. Donations, including direct contributions and grant funding

Niles then proceeds to break down the three and concludes, "Publishers must take a sober look at these three options and decide how best to maximize their income opportunities within them."

Others might disagree with Niles and cite a plethora of other revenue streams (see: How to turn journalists into profit centers), but I don't think we can outright dismiss Niles's point of view by dreaming up other revenue streams outside of these trusted few.

Keep in mind the tone I mentioned earlier: Honesty, both the good and the bad. So let's take a good long look at just the headline. Certainly, Niles didn't mean there were no revenue streams. He simply meant there is no new revenue stream to pluck out of the sky aside from those main three.

But let's take his headline to the extreme for a minute. We can keep these three revenue streams and, as the trends show, entertain the idea that journalism just isn't sustainable. That's what I did in a thought experiment while witnessing the back and forth banter of two friends on Twitter, an exchange that was archived by Deanna Zandt

My response to 'journalism mimics art' in full was captured in a bit of a rant:

....a hard cold truth might be that [journalism] isn't sustainable.

But you know what - even if journalism isn't sustainable in that classic sense it doesn't mean it will disappear. There are plenty of endeavors that have NEVER been sustainable in the true sense of the word.

I use poetry as an example. Poetry in and of itself has never been sustainable in the way we might think of other goods and services.

Are we afraid poetry will die? No. Has it ever even been scarce?

I think we could extend this [lack of sustainability] to almost all of the high arts (as opposed to pop arts).

One of Clay Shirky's most profound and popular posts about newspapers had this to say:

The expense of printing created an environment where Wal-Mart was willing to subsidize the Baghdad bureau. This wasn't because of any deep link between advertising and reporting, nor was it about any real desire on the part of Wal-Mart to have their marketing
budget go to international correspondents. It was just an accident.

And while we all agree with the wisdom of this, we seldom take Shirky to task. If Wal-Mart won't subsidize journalism, somebody else must step up. But perhaps whoever that is won't have profits and sustainability in mind.

I'm not proposing that we just give up, all join co-ops and grow dreadlocks (although that would be cool with my internal hippie). What I am suggesting is that, in this age of experimentation, which we all agree is happening, there are certain assumptions we make that steer the direction of our thought.

One of those assumptions, and I claim this all the time, is that there will always be a market for news and information. That marketplace is in flux and hard to pin down at the moment, but people want accurate and thorough news and information. If this assumption is true, then journalism will be sustainable once we figure out the marketplace again and how to "sell" the news.

Compare this to poetry, where there is little demand. There is no robust marketplace and poetry is not "sustainable" in the true sense of the word. Instead, it is traditionally professionalized through patrons of the arts.

The Relative Importance of News and Information

In conversations with people that conduct audience research I've come to realize that news and information is not as important to the average reader as it is to folks like you and me (bloggers, journalists, news junkies, etc).

Here's what they tell me: two times a week. That's how often people have the urge to dive into civic issues at the local level. Of those two times it's unclear whether or not news and information is even desired, or if it's just the urge to tutor the kids at your local school, or do some public gardening, etc.

I wonder how often people feel the urge to hear poetry?

I don't claim to know any truths about the value of journalism and original reporting. Hey, I'm biased! I'm just suggesting that, as journalists, when we have this discussion we should recognize our bias and tendency for over-valuation.

In that vein, I want to follow this train of thought to an extreme. For me, it's often helpful to think in extreme examples and then determine the factors that lead to one or the other extreme. I could very easily write a blog post where the value of news and information is compared to food (three times a day, please) instead of poetry. Following that path would give us different conclusions.

So fear not! No truth has been discovered in this post -- it's just an attempt to shake things up.

Spot.Us Updates

1. The Redesign is making progress -- It's always slower than you want it to be. But so is transportation. One day, we'll just be able to snap our fingers and, presto-insto, be somewhere new.

2. Pitches are coming in and going through the pipeline -- We still need to figure out a better way to keep pitches on a deadline. A last resort would be to start deducting money from pitches that go past deadline, but that is a last resort. I'm sure there are other measures we can put into place to make sure deadlines are met. By the way, the newest pitch comes from a very cool Peter Byrne who wants to investigate the UC Regents.

3. The iterative process continues.
Reblog this post [with Zemanta]
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl