Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

November 03 2010

14:00

It’s people! Meet Soylent, the crowdsourced copy editor

The phrase “on-demand human computation” has a sinister tinge to it, if only because the idea of sucking the brain power out of a group of people is generally frowned upon. And yet, if you call it “crowdsourcing” everything sounds so much friendlier!

But calling Soylent “crowdsourced copy-editing” isn’t quite fair, since the system performs the type of jobs that are somewhere in the gray area between man and machine. More than a spell check, not quite the nightside copy editor versed in AP style, Soylent really is on-demand computation. It’s what all word processors need, the “Can you take a look at this?” button with a small workforce of people at your disposal.

Soylent is an add-in for Microsoft Word that uses Mechanical Turk as a distributed copy-editing system to perform tasks like proofreading and text-shortening, as well as a type of specialized edits its developers call “The Human Macro.” Currently in closed beta, Soylent was created by compsci students at MIT, Berkeley, and University of Michigan.

For those unfamiliar, Mechanical Turk is an Amazon service that makes it easier for small tasks (and the money to pay for them) to be distributed among a group of humans called Turkers. While savvy writers could already use MTurk to edit their work, the team at Soylent believes their system can produce better and more efficient results than would a writer working alone.

“The idea of Soylent is, what if we could embed human knowledge in the word processor?” MIT’s Michael Bernstein, the lead researcher on Soylent, told me.

That sounds technical, but as Bernstein explains, we all call on friends for help when writing. Research paper, essay, email, story, or blog post — most people rely on a second pair of eyeballs for help at least some of the time. And one thing Mechanical Turk has to offer is a lot of eyeballs.

Soylent’s three current features are called Shortn, Crowdproof, and the Human Macro:

Shortn: Ever write 1,700 words and blow right past your 1,200 word count? Shortn lets writers submit passages of text to MTurk for trimming. They can determine how much they want to cut with a handy slider tool.

Crowdproof: A superpowered, sophisticated spell, grammar and style check that provides suggestions as well as explanations why your choices are wrong.

The Human Macro: For more complicated changes — something like “change all verbs to past tense” — the Human Macro is, as Bernstein says, programming-as-craigslist-ad. The writer describes the changes she wants (capitalization of proper names, altering verb tense, annotating references with Creative Commons photos) in a request form, which humans then act on.

Bernstein argues that Soylent’s cold, detached eye is just want some writing needs. “It’s really hard to kill your own babies in your writing,” Bernstein said. “To be honest, another motivation for me is that it’s very time consuming to go and snip words and cut things from paragraphs an hour before deadline.”

But to writers already nervous about those babies being disappeared on the copy desk, handing over their copy to the faceless masses might not sound like a solution. In their research, Bernstein and his colleagues identified “lazy” and “overeager” individual Turkers, with the lazy ones doing the minimal amount of work and the overeager making wholesale changes. Bernstein said the distributed editing process behind Soylent eliminates this problem because no one Turker is working with whole passages of a document; the work is split among many.

Some in news circles are already experimenting with Mechanical Turk; ProPublica used it to identify companies getting stimulus dollars for the Recovery Tracker project. (Here at the Lab, we use it for the long transcripts we sometimes run of video or audio interviews.) MTurk could be used for any number of tasks that call for on-demand labor. But what makes Soylent different from using MTurk directly is a programming pattern Bernstein and his colleagues created called Find-Fix-Verify, which disseminates tasks across a large group of workers. The only thing required of writers is an Amazon account to pay Turkers; Soylent sets the payment rates.

Instead of one Turker reading over an entire page or paragraph, Soylent asks a group of workers to find areas that need fixing and make corrections. Those fixes are then then filtered by other Turkers for inaccuracies, which produces a set of recommendations or an edited graph to a writer. Depending on the job and the document, it usually took Soylent around 40 minutes to complete a task.

To news traditionalists, Soylent may sound like the latest turn toward outsourcing in journalism that has sent copy editing jobs to places in India. It could also be akin to the automated journalism being tested by some companies or the Huffington Post’s real-time headline testing. And some day it may be. But Soylent is far from ready for the mainstream, thanks to the processing time and payment methods. Bernstein says they’re working towards having real-time edits and managing payment through Soylent, as well adapting the program to work on photo editing. Instead of outsourcing, think of Soylent as microsourcing.

And about that name: It comes from exactly what you’re thinking. Bernstein said they were looking for something familiar but also true to the idea of what they created. Soylent, is made of people. It is indeed, people.

“The original name was Homunculus,” Bernstein said. “It didn’t have the same ring to it.”

October 04 2010

12:24

Open data meets FOI via some nifty automation

OpenlyLocal generated FOI request

Now this is an example of what’s possible with open data and some very clever thinking. Chris Taggart blogs about a new tool on his OpenlyLocal platform that allows you to send a Freedom of Information (FOI) request based on a particular item of spending. “This further lowers the barriers to armchair auditors wanting to understand where the money goes, and the request even includes all the usual ‘boilerplate’ to help avoid specious refusals.”

It takes around a minute to generate an FOI request.

The function is limited to items of spending above £10,000. Cleverly, it’s also all linked so you can see if an FOI request has already been generated and answered.

Although the tool sits on OpenlyLocalFrancis Irving at WhatDoTheyKnow gets enormous credit for making their side of the operation work with it.

Once again you have to ask why a media organisation isn’t creating these sorts of tools to help generate journalism beyond the walls of its newsroom.

August 11 2010

17:10

Importing XML / RSS feeds into Wordpress from another cms

Have you exported stories via XML from a cms and then imported them to Wordpress using FTP access? If you have, how did you do it? Thanks for any help you can provide!

May 18 2010

16:00

Mediagazer: From zero to big traffic driver in just two short months

Last week we were perusing our Google Analytics report here at the Lab and one data point stood out: A site barely two months old had inched into our top 10 referring sites for the previous month. Checking today, it’s up into our top five, passing up many more traditional traffic drivers.

The site is Mediagazer, the media-focused offshoot of the popular technology site Techmeme, and like its sibling it combines editors and an algorithm to gather the best stories on its subject from around the web. On Monday, Mediagazer debuted a feature called Leaderboard (it came first to Techmeme) which ranks news-about-news sources in terms of their prominence on Mediagazer. (We fare well on it, but I swear that’s not why we’re interested.)

I spoke with the site’s editor Megan McCarthy about how the site became a traffic-driver so quickly. McCarthy credits the site’s addictive quality: People arrive via the online equivalent of word of mouth, like social media, and once they’re there, a hefty (though undisclosed) percentage keep coming back. The site already has a core readership that checks in every day, McCarthy said. Mediagazer refreshes every five minutes, thanks to the algorithm searching the web for new content getting linked by other sites; meanwhile, McCarthy is trolling the web for links the algorithm might not have seen yet and prioritizing the ones it has. On a typical day, Mediagazer links to about 40 stories. (McCarthy would not disclose monthly traffic statistics.)

Mediagazer isn’t entering an empty space; from Romenesko to our own Twitter feed, there are plenty of people sorting through the media news of the day. Mediagazer’s scope is broader than, say, ours, including things like new TV lineups and media criticism we wouldn’t cover. Mediagazer joins the other sites run by Techmeme: political news at Memeorandum, celebrity gossip at WeSmirch, and baseball at BallBug, although those three sites are purely automated with no human intervention.

The site is also active on Twitter, sending out the links it posts, with the tweak of including the personal Twitter handle of the author who wrote the post, as you can see above. (The tweet attribution is automated, but requires a one-time setup process with the help of the human.) McCarthy said they want to let journalists know about Mediagazer — I certainly noticed the @ mentions showing up in my Twitter feed — and they want to give readers another opportunity to drill down into a subject area of interest.

“I want anyone who looks at the site to know, not only what’s going on [in the media industry], but what’s going to happen,” McCarthy said.

The combination of links, frequent updates, and obsessive readers seems to create the kind of place that active tweeters and bloggers would stop by. That target audience is clear in the kind of advertisements Mediagazer serves — they seem to be primarily from companies that provide software services to bloggers. It also probably explains why we’re seeing so many Mediagazer readers coming our way.

December 08 2009

19:05

5 Tools to Help Automate Local Advertising

Promises of whiter teeth, IQ quizzes, and digital dancing people clutter online ads these days. At the same time, experts at future-of-journalism conferences are declaring that news will never again be solely supported by advertising. Neither one tells the full story of the present and future of online advertising for hyper-local and other news websites.

Experiments with new advertising technology are popping up everywhere. Websites are trying to reach smaller, local advertisers that have been underserved for years by legacy media. This local and hyper-local ad market will be a significant part the future of journalism, says Jeff Jarvis, author of "What Would Google Do?" and associate professor and director of the interactive journalism program at the City University of New York.

Even Twitter is dropping hints about its advertising plans. Founder Biz Stone said the company is not considering text and display ads for Twitter's home page, but he told Reuters on November 25 that the company plans to make money with "non-traditional" advertising. Stone didn't define what non-traditional ads will look like, but here are five examples of new tools that websites can use to make money from advertising.

5 Tools for Automating Local Ads

1. PlaceLocal: A new, hosted solution that allows publishers to automate local ad creation and sales. It's operated by PaperG, a startup led by Victor Wong, who is taking time off from Yale to develop his business. PlaceLocal automatically builds customized ads for any local business using just its name and address. The tool can even create a landing page for a small advertiser, Wong said in an interview during which he demonstrated the tool.

VictorWongmug.gif

The technology builds an ad using algorithms, and by searching databases and the web for reviews, photos, or entertainment listings. It also filters out any content that has a negative tone. The tool allows advertising representatives or publishers to easily build ads on spec and use them as a sales tool, Wong said.

"Some of our partners are using it to crawl their own databases," he said. The tool can be deployed on a publisher's ad servers or run separately, and payment to PaperG is based on a revenue-share basis, he said. "If customers aren't buying ads, we won't make money," he said. "If they are, we will make money." Several media properties are testing the software, Wong said, and a public launch is planned this month. PaperG raised $1.1 million in its second round, from people like the former Boston.com publisher Steve Taylor and Mark Potts, CEO of GrowthSpur, according to paidContent.

2. Dynamic ads: Offered by TheDigitel in Charleston, S.C., this tool allows advertisers to change ad content dynamically via a text message, Facebook or blog update, or using a Twitter or Flickr photo feed. "If you can feed it, our thing can eat it," claims TheDigitel's website. Advertisers fill out a form to create the ad and designate which parts are static, and which are dynamic.

3. Flyerboard: A virtual bulletin board that enables small, local advertisers to create flyers that are then distributed to hyper-local websites. This is another offering from PaperG. The tool, which lets readers share the flyers on Facebook and Twitter, is deployed at sites like the New Haven Independent, Boston.com's Newton, and some of Hearst's local sites, like The Woodlands in the Houston suburbs. Flyerboard is a permanent widget installed on local sites, and revenue is shared between the site and PaperG. Wong said Flyerboard has generated 1% clickthrough rates for ads on some hyper-local sites, outperforming traditional advertising.

minnpost realtime.jpg

4. Real time ads: These are delivered at MinnPost.com, a non-profit site covering Minnesota. Joel Kramer, CEO and editor of the site, showed off the concept during a panel at the Online News Association conference in early October. MinnPost champions the ads as a simple way to avoid creating specific messages just for one website. Rather, small advertisers can harness an RSS feed from an existing blog or business tweet stream. The text is displayed on a widget at MinnPost.com, with a link to a pop-up page displaying the text with images and links to originating websites.

5. Self-serve ads: Multiple examples of self-service ad vendors exist for print and the web, such as the Instiads offered through Neighborlogs, a placeblogging platform based in Seattle, and PageGage. Also, AdReady is used by The New York Times. A partnership for a self-service ad network was announced in September between the Tribune Company and MediaSpectrum.

Most of these services offer hosting and billing assistance in exchange for a percentage of revenue. Other companies, such as Trafficspaces, offer advertising software-as-a-service with monthly fees. Trafficspaces has also launched a free version with sponsored ads. Another ad management company, isocket, is in private beta with TechCrunch, and recently received $2 million in seed funding. Mobile self-serve ad tools have sprouted up as well, such as Zeep Media, which sets ad prices via auction, and Mojiva.

Sharing Space

Beyond the new technology offerings and platforms, traditional media organizations have begun sharing ad space with smaller publishers, borrowing the ad network concept from digital natives like the Blogher network of independent blogs for women. The Miami Herald, a McClatchy newspaper, has launched a Community News Network and is partnering with local websites for content on the Herald's site and sharing ad positions on those pages.

Andria Krewson is a freelance journalist and consultant from Charlotte, N.C. She has worked at newspapers for 27 years, focusing on design and editing of community niche publications. She blogs for her neighborhood at Under Oak, writes occasionally as a Tar Heel mom at The Daily Tar Heel and covers changing culture at Crossroads Charlotte. Twitter: underoak.

This is a summary. Visit our site for the full post ».

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl