Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 15 2013

12:20

The newsonomics of where NewsRight went wrong

newsright-wide

Quietly, very quietly, NewsRight — once touted as the American newspaper industry’s bid to protect its content and make more money from it — has closed its doors.

Yesterday, it conducted a concluding board meeting, aimed at tying up loose ends. That meeting follows the issuing of a put-your-best-face-on-it press release two weeks ago. Though the news has been out there, hardly a whimper was heard.

Why?

Chalk it up, first, to how few people are really still covering the $38.6 billion U.S. newspaper industry. Then add in the fact that the world is changing rapidly. Piracy protection has declined as a top publisher concern. Google’s snippetization of the news universe is bothersome, but less of a central issue. The declining relative value of the desktop web — where NewsRight was primarily aimed — in the mobile age played a part. Non-industry-owned players like NewsCred (“The newsonomics of recycling journalism”) have been born, offering publishers revenue streams similar to those that NewsRight itself was intended to create.

Further, new ways to value news content — through all-access subscriptions and app-based delivery, content marketing, marketing services, innovative niching and more — have all emerged in the last couple of years.

Put a positive spin on it, and the U.S. newspaper industry is looking forward, rather than backward, as it seeks to find new ways to grow reader and ad revenues.

That’s all true. But it’s also instructive to consider the failure of NewsRight.

It’s easy to deride it as NewsWrong. It’s one of those enterprises that may just have been born under a bad sign. Instead of the stars converging, they collided.

NewsRight emerged as an Associated Press incubator project. If you recall the old AP News Registry and its “beacon,” NewsRight became its next iteration. It was intended to track news content as it traversed the web, detecting piracy along the way (“Remember the beacon”). It was an ambitious databasing project, at its peak taking in feeds from more than 900 news sites. The idea: create the largest database of current news content in the country, both categorized by topic and increasingly trackable as it was used (or misused) on the web.

AP initially incentivized member newspapers to contribute to the News Registry by discounting some of their annual fees. Then a bigger initiative emerged, first called the News Licensing Group (NLG). The strategy: harness the power of the growing registry to better monetize newspaper content through smart licensing.

NLG grew into a separate company, with AP contributing the registry’s intellectual property and becoming one of 29 partners. The other 28: U.S. daily newspaper companies and the leading European newspaper and magazine publisher Axel Springer. Those partners collectively committed more than $20 million — though they ended up spending only something more than half of that before locking up the premises.

Renamed NewsRight, it was an industry consortium, and here a truism applies: It’s tougher for a consortium — as much aimed at defense than offense — to innovate and adjust quickly. Or, to put it in vaudevillian terms: Dying is easy — making decisions among 29 newspaper companies can be torture.

It formally launched just more than a year ago, in January 2012 (“NewsRight’s potential: New content packages, niche audiences, and revenue”), and the issues surfaced immediately. Let’s count the top three:

  • Its strategy was muddled. Was it primarily a content-protection play, bent on challenging piracy and misuse? Or was it a way to license one of the largest collections of categorized news content? Which way did it want to go? Instead of deciding between the two, it straddled both.
  • In May 2011, seven months before the launch, the board had picked TV veteran David Westin as its first CEO. Formerly head of ABC News, he seemed an odd fit from the beginning. A TV guy in a text world. An analog guy in a digital world. Then friction between Westin and those who had hired him — including then-AP CEO Tom Curley — only complicated the strategic indecision. Westin was let go in July, which I noted then, was the beginning of the end.
  • Publishers’ own interests were too tough to balance with the common good. Though both The New York Times Company and AP were owners, it was problematic to include feeds of the Times and AP in the main NewsRight “catalog.” The partners tried to find prices suitable for the high-value national content (including the Times and AP) and the somewhat lesser-valued regional content, but that exercise proved difficult, the difficulty of execution exacerbated by anti-trust laws. Potential customers, of course, wanted the Times and AP as part of any deal, so dealmaking was hampered.

Further, all publishers take in steady revenue streams — collectively in the tens of millions — from enterprise licensors, like LexisNexis, Factiva, and Thomson Reuters, as well as education and copyright markets. NewsRight’s owners (the newspaper companies) didn’t want NewsRight to get in the way of those revenue streams — and those were the only licensing streams that had proven lucrative over time.

Long story short, NewsRight was hobbled from the beginning, and in its brief life, was able to announce only two significant customer, Moreover and Cision, and several smaller ones.

How could it have been so difficult?

It’s understandable on one level. Publishers have seethed with rage as they’ve seen their substantial investment in newsrooms harvested — for nothing — by many aggregators from Google to the tens of thousands of websites that actually steal full-text content. Those sites all monetize the content with advertising, and, save a few licensing agreements (notably with AP itself), they share little in the way of ad revenue.

But rage — whether seething or public — isn’t a business model.

Anti-piracy, itself, has also proven not to be much of a business model. Witness the tribulations of Attributor, an AP-invested-in content-tracking service that used some pretty good technology to track pirated content. It couldn’t get the big ad providers to act on piracy, though. Last year, after pointing its business in the direction of book industry digital rights management, it was sold for a meager $5.6 million to Digimarc.

So if anti-piracy couldn’t wasn’t much of a business model, then the question turned to who would pay to license NewsRight’s feed of all that content, or subsets of it?

Given that owner-publishers wanted to protect their existing licensing streams, NewsRight turned its sights to an area that had not well-monetized: media monitoring.

Media monitoring is a storied field. When I did content syndication for Knight Ridder at the turn of the century, I was lucky enough to visit Burrelles (now BurrellesLuce) in Livingston, New Jersey. In addition to a great auto tour of Tony Soprano country, I got to visit the company in the midst of transition.

In one office, older men with actual green eyeshades meticulously clipped periodicals (with scissors), monitoring company mentions in the press. The company then took the clips and mailed them. That’s a business that sustained many a press agent for many a decade: “Look, see the press we got ya!”

In Burrelles’ back rooms, the new digital monitoring of press mention was beginning to take form. Today, media monitoring is a good, if mature, industry segment, dominated by companies like Cision, BurrellesLuce, and Vocus, as social media monitoring and sentiment analysis both widen and complicate the field. Figure there are more than a hundred media monitoring companies of note.

Yet even within the relatively slim segment of the media monitoring space, NewsRight couldn’t get enough traction fast enough. Its ability to grow revenues there — and then to pivot into newer areas like mobile aggregation and content marketing — ran into the frustrations of the owner-newspapers. So they pulled the plug, spending less than they had actually committed. They decided to cut their losses, and move on.

Moving on meant making NewsRight’s last deal. The company — which has let go its fewer than 10 employees — announced that it had “joined forces” with BurrellesLuce and Moreover. It’s a face-saver — and maybe more.

Those two companies will try to extend media monitoring contracts for newspaper companies. BurrellesLuce (handling licensing and aggregation) and Moreover (handling billing and tracking) will make content available under the NewsRight name. The partnership’s new CAP (Compliant Article Program) seeks to further contracting for digital media monitoring rights, a murky legal area. If CAP works, publishers, Moreover, and BurrellesLuce will share in the new revenue.

What about NewsRight’s anti-piracy mandate? That advocacy position transitions over to the Newspaper Association of America.

NAA is itself in the process of being restyled into a new industry hub (with its merger and more) under new CEO Caroline Little. “As both guardian and evangelist for the newspaper industry, the NAA feels a tremendous responsibility to protect original content generated by its members,” noted Little in the NewsRight release.

What about the 1,000-title content database, the former AP registry that had formed the nucleus of NewsRight? It’s in limbo, and isn’t part of the BurrellesLuce/Moreover turnover. Its categorization technology has had stumbles and overall the system needs an upgrade.

There’s a big irony here.

In 2013, we’re seeing more innovative use of news content than we have in a long time. From NewsCred’s innovative aggregation model to Flipboard’s DIY news magazines, from new content marketing initiatives at The New York Times, Washington Post, Buzzfeed, and Forbes to regional agency businesses like The Dallas Morning News’ Speakeasy, there are many new ways news content is being monetized.

We’re really in the midst of a new content re-evaluation. No one makes the mistake this time around of calling news content king, but its value is being reproven amid these fledgling strategies.

Maybe the advent of a NewsCred — which plainly better understood and better built technology to value a new kind of content aggregation — makes NewsRight redundant. That’s in a sense what the partners decided: let the staffs of BurrellesLuce and Moreover and smarts of the NewsCreds make sense of whatever newer licensing markets are out there. Let them give the would-be buyers what they want: a licensing process to be as simple as it can be. One-stop, one-click, or as close as you can manage to that. While the disbanding of NewsRight seems to take the news industry in the opposite, more atomized, direction, in one way, it may be the third-party players who succeed here.

So is it that NewsRight is ending with a whimper, or maybe a sigh of relief? Both, plainly. It’s telling that no one at NewsRight was either willing or able to talk about the shutdown.

Thumbs down to content consortia. Thumbs up to letting the freer market of entrepreneurs make sense of the content landscape, with publishers getting paid something for what the companies still know how to do: produce highly valued content.

July 06 2010

14:00

The ASCAP example: How news organizations could liberate content, skip negotiations, and still get paid

Jason Fry suggested in a post here last week that current paywall thinking might be just a temporary stop along the way to adoption of “paytags — bits of code that accompany individual articles or features, and that allow them to be paid for.” But how? As Fry recognizes, “between wallet friction and the penny gap, the mechanics of paytags make paywalls and single-site meters look like comparatively simple problems to solve.”

I suggested a possible framework for a solution during a couple of sessions at the conference “From Blueprint to Building: Making the Market for Digital Information,” which took place at the University of Missouri’s Reynolds Journalism Institute June 23-25. Basically, my “what-if” consisted of two questions:

  1. What if news content owners and creators adopted a variation on the long-established ASCAP-BMI performance rights organization system as a model by which they could collect payment for some of their content when it is distributed outside the boundaries of their own publications and websites?
  2. And, taking it a step further, what if they used a variant of Google’s simple, clever, and incredibly successful text advertising auction system to establish sales-optimizing pricing for such content?

News publishers have been tying themselves in knots for the last few years deciding whether or not to charge readers for content, and if so, how much and in what fashion — micropayments, subscriptions, metered, freemium and other ideas have all been proposed and are being tested or developed for testing.

As well, publishers have complained about the perceived misuse of their content by aggregators of all stripes and sizes, from Google News down to neighborhood bloggers. They’ve expressed frustration (“We’re mad as hell and we are not going to take it anymore,” Associated Press chair Dean Singleton said last year), and vowed to go after the bandits.

But at the same time, many publishers recognize that it’s to their advantage to have their content distributed beyond the bounds of their own sites, especially if they can get paid for it. When radio was developed in the 1920s, musicians and music publishers recognized they would benefit from wider distribution of their music through the new medium, but they needed a way to collect royalties without each artist having to negotiate individually with each broadcaster.

A model from music

That problem was solved by using a non-profit clearinghouse, ASCAP (American Society of Composers, Authors and Publishers), which had been formed in 1914 to protect rights and collect royalties on live performances. Today the performance-rights market in the U.S. is shared between ASCAP, BMI (Broadcast Music Incorporated, founded by broadcasters rather than artists) and the much smaller SESAC (formerly the Society of European Stage Authors & Composers). Using digital fingerprinting techniques, these organizations collect royalties on behalf of artists whose works are performed in public venues such as restaurants and shopping centers as well as on radio and television stations and streaming services such as Pandora.

Publishers have put a lot of effort into trying to confine news content to tightly-controlled channels such as their own destination websites, designated syndication channels, apps, and APIs in order to control monetization via advertising and direct user payments. But when content moves outside those bounds, as it can very easily, publishers have no way to regulate it or collect fees — so they cry foul and look for ways to stop the piracy or extract payments from the miscreants.

Among the content-protection schemes, AP is rolling out News Registry, which it touts as a way of at least tracking the distribution of content across the web, whether authorized or not, and Attributor offers “anti-piracy” services by “enforcement experts” to track down unauthorized use of content. But for now, content misuse identified by these systems will require individual action to remove it or force payment. In the long run, that’s not a viable way to collect royalties.

Suppose, instead, that news publishers allowed their content to it be distributed anywhere online (just as music can be played by any radio station) as long as it were licensed by a clearinghouse, similar to ASCAP and BMI, that would track usage, set prices, and channel payments back to the content creator/owner?

To do this, perhaps the paytags Fry suggested are needed, or perhaps publishers can learn from the music industry and use the equivalent of the digital fingerprints that allow ASCAP’s MediaGuide to track radio play. (The basic technology for this is around: AP’s News Registry uses hNews microtags as well as embedded pixels (“clear GIFs”); Attributor’s methodology is closer to the digital fingerprinting technique.)

How it could work

The system for broadcast and performance music payments is a three-way exchange consisting of (a) artists and composers, (b) broadcasters and performance venues, and (c) performance rights organizations (ASCAP and BMI).

In the news ecosystem the equivalents would be (a) content creators and owners, (b) end users including both individual consumers and “remixers” (aggregators, other publishers, bloggers, etc.); and (c) one or more content clearinghouses providing services analogous to those of ASCAP and BMI.

The difference between a news payments clearinghouse and the music industry model would be in scale, speed and complexity. In the news ecosystem, just as in the music world, there are potentially many thousands of content creators — but there are millions of potential end users, compared to a manageable number of radio stations and public performance venues paying music licensing fees. And there are far more news stories than musical units; they’re distributed faster and are much shorter-lived than songs. In the radio and public performance sphere, music content still travels hierarchically; that was true in the news business 20 years ago, but today news travels in a networked fashion.

To handle the exchange of rights and content in this vastly more complex environment, a real-time variable pricing model could be developed, benefiting both the buyers and sellers of content. Sellers benefit because with variable pricing or price discrimination, sales and revenue are maximized, since content goods are sold across the price spectrum to various buyers at the price each is willing to pay — think of the way airline seats are sold. Buyers benefit because they can establish the maximum price they are willing to pay. They may not be able buy at that price, but they are not subject to the take-it-or-leave-it of fixed pricing.

When it comes to news content, a variable pricing strategy was suggested last year by Albert Sun, then a University of Pennsylvania student; now a graphics designer with The Wall Street Journal. (Sun also wrote a senior thesis on the idea called “A Mixed Bundling Pricing Model for News Websites.”) The graphs on his post do a good job showing how a price-discrimination strategy can maximize revenue; it was also the subject of one of my posts here at the Lab.

A well-known real-time variable pricing arrangement is the Google AdSense auction system, which establishes a price for every search ad sold by Google. Most of these ads are shown to users at no cost to the advertisers; they pay only when the user clicks on the ad. The price is determined individually for each click, via an algorithm that takes into account the maximum price the advertiser is willing to pay; the prices other advertisers on the same search page are willing to pay; and the relative “Quality Score” (a combination of clickthrough rate, relevancy and landing page quality) assigned to each advertiser by another Google. It works extraordinarily well, not only for advertisers but for Google, which reaps more than $20 billion in annual revenue from it.

Smart economist needed

What’s needed in the news ecosystem is something similar, though quite a bit more complex. Like the Google auction, the buyer’s side would be simple: buyers (whether individuals or remixers such as aggregators) establish a maximum price they are willing to pay for a particular content product — this could be an individual story, video, or audio report, or it could be a content package, like a subscription to a topical channel. This maximum price is determined by an array of factors that will be different for every buyer, but may include timeliness, authoritativeness, relevance to the buyer’s interests, etc., and may also be affected by social recommendations or the buyer’s news consumption habits. But for the purposes of the algorithm, all of these factors are distilled in the buyer’s mind into a maximum price point.

The seller is the content creator or owner who has agreed to share content through the system, including having remixers publish and resell it. Sellers retain ownership rights, and share revenue with the remixer when a transaction takes place. The price that may be acceptable to a content owner/seller will vary (a) by the owner’s reputation or authority (this is analogous to Google’s assignment of a reputation score to advertisers), and (b) by time — since generally, the value of news content will drop quickly within hours or days of its original publication.

The pricing algorithm, then, needs to take into account both the buyer’s maximum price point and the seller’s minimum acceptable price based on time and reputation; and at least two more things: (a) the uniqueness of the content — is it one of several content items on the same topic (multiple reports on an event from different sources), or is it a unique report not available elsewhere (a scoop, or an enterprise story) — and (b) the demand for the particular piece of content — is it popular, is it trending up, or has it run its course?

The outcome of this auction algorithm would be that different prices would be paid by different buyers of the same content — in other words, sales would occur at many points along the demand curve as illustrated in Sun’s post, maximizing revenue. But it’s also likely that the system would establish a price of zero in many cases, which is an outcome that participating publishers would have to accept. And of course, many remixers would choose to offer content free and step into the auction themselves as buyers of publication rights rather than as resellers.

In my mind, the actual pricing algorithm is still a black box, to be invented by a clever economist. For the moment, it’s enough to say that it would be an efficient, real-time, variable pricing mechanism, maintained by a clearinghouse analogous to ASCAP and BMI, allowing content to reach end users through a network, rather than only through the content creator’s own website and licensees. Like ASCAP and BMI, it bypasses the endless complexities of having every content creator negotiate rights and pricing with every remixer. The end result would be a system in which content flows freely to end users, the value of content is maximized, and revenue flows efficiently to content owners, with a share to remixers.

Clearly, such a system would need a lot of transparency, with all the parties (readers, publishers, remixers) able to see what’s going on. For example, if a multiple news sources have stories on the same event, they might be offered to a reader at a range of prices, including options priced above the reader’s maximum acceptable price.

Protecting existing streams

Just as ASCAP and BMI play no role when musicians sell content in uncomplicated market settings the musicians can control — for example, concert tickets, CD sales, posters, or other direct sales — this system would not affect pricing within the confines of the content owner’s own site or its direct licensees. But by enabling networked distribution and sales well beyond those confines, it has the potential to vastly increase the content owner’s revenue. And, the system need not start out with complex, full-blown real-time variable pricing machinery — it could begin with simpler pricing options (as Google did) and move gradually toward something more sophisticated.

Now, all of this depends, of course, on whether the various tentative and isolated experiments in content pricing bear fruit. I’m personally still a skeptic on whether they’ll work well outside of the most dominant and authoritative news sources. I think The New York Times will be successful, just as The Wall Street Journal and Financial Times have been. But I doubt whether paywalls at small regional newspapers motivated by a desire to “protect print” will even marginally slow down the inevitable transition of readers from print to digital consumption of news.

A better long-term strategy than “protect print” would be to move to a digital ecosystem in which any publisher’s content, traveling through a network of aggregators and remixers, can reach any reader, viewer or listener anywhere, with prices set efficiently and on the fly, and with the ensuing revenue shared back to the content owner. The system I’ve outlined would do that. By opening up new potential markets for content, it would encourage publishers to develop higher-value content, and more of it. The news audience would increase, along with ad revenue, because content would travel to where the readers, listeners or viewers are. Aggregators and other remixers would have be incentivized to join the clearinghouse network. Today, few aggregators would agree to compensate content owners for the use of snippets. But many of them would welcome an opportunity legitimately to use complete stories, graphics and videos, in exchange for royalties shared with the content creators and owners.

Granted, this system would not plug every leak. If you email the full text of a story to a friend, technically that might violate a copyright — just like sharing a music file does — but the clearinghouse would not have the means to collect a fee (although the paytag, if attached, might at least track that usage). There will be plenty of sketchy sites out there bypassing the system, just as there are sketchy bars that have entertainment but avoid buying an ASCAP license.

But a system based on a broadly-agreed pricing convention is more likely to gain acceptance than one based on piracy detection and rights enforcement. Like ASCAP’s, the system would require a neutral, probably nonprofit, clearinghouse.

How could such an entity be established, and how would it gain traction among publishers, remixers and consumers? Well, here’s how ASCAP got started: It was founded in 1914 by Victor Herbert, the composer, who was well-connected in the world of musicians, composers, music publishers and performance venues, and who had previously pushed for the adoption of the 1909 Copyright Act. Herbert enlisted influential friends like Irving Berlin and John Philip Sousa.

Today, just as a few outspoken voices like Rupert Murdoch are moving the industry toward paywalls, perhaps a few equally influential voices can champion this next step, a pricing method and payments clearinghouse to enable publishers to reap the value of content liberated to travel where the audience is.

Acknowledgments/disclosures: The organizer of the conference where I had the brainstorm leading to this idea, Bill Densmore, has spent many years thinking about the challenges and opportunities related to networked distribution, payment systems, and user management for authoritative news content. A company he founded, Clickshare, holds patents on related technology, and for the last two years he has worked at the University of Missouri on the Information Valet Project, a plan to create a shared-user network that would “allow online users to easily share, sell and buy content through multiple websites with one ID, password, account and bill.” Densmore is also one of my partners in a company called CircLabs, which grew out of the Information Valet Project. The ideas presented in this post incorporate some of Densmore’s ideas, but also differ in important ways including the nature of the pricing mechanism and whether there’s a need for a single ID.

Photo by Ian Hayhurst used under a Creative Commons license.

March 24 2010

18:34

March 23 2010

18:27

Why Newsrooms Don't Use Plagiarism Detection Services

Six years ago, in the wake of the Jayson Blair scandal at the New York Times, Peter Bhatia, then the president of the American Society of Newspaper Editors, gave a provocative speech at the organization's 2004 conference.

"One way to define the past ASNE year is to say it began with Jayson Blair and ended with Jack Kelley," he said.

Bhatia's message was that it was time for the industry and profession to take new measures to prevent serious breaches such as plagiarism and fabrication:

Isn't it time to say enough? Isn't it time to attack this problem in our newsrooms head-on and put an end to the madness? ... Are we really able to say we are doing everything possible to make sure our newspaper and staffs are operating at the highest ethical level?

Today, six years after his speech, another plagiarism scandal erupted at the New York Times (though it's certainly not on the scale of Blair's transgressions). A separate incident also recently unfolded at the Daily Beast. Once again, the profession is engaged in discussion and debate about how to handle this issue. One suggestion I made in my weekly column for Columbia Journalism Review was for newsrooms to start using plagiarism detection software to perform random checks on articles. New York Times public editor Clark Hoyt followed up with a blog post on this idea, and there has been other discussion.

Many people are left wondering how effective these services are, why they aren't being used in newsrooms, and which ones might be the best candidates for use in journalism. Surprisingly, it turns out that newsrooms are more interested in finding out who's stealing their content online than making sure the content they publish is original.

Why Newsrooms Don't Use Them

  1. Cost: The idea of spending potentially thousands of dollars on one of these services is a tough sell in today's newsrooms. "We've had a lot conversation with media outlets, particularly after a major issue comes up, but the conversation is ultimately what is the cost and whatever cost I give them, they think it's nuts," Robert Creutz, the general manager of the iThenticate plagiarism detection service, told me. He estimated his service, which is the most expensive one out there, would charge between $5,000 and $10,000 per year to a large newspaper that was doing random checks on a select number of articles every day. Many other detection services would charge far less, but it seems that any kind of cost is prohibitive these days.
  2. Workflow When New York Times public editor Clark Hoyt asked the paper's "editor for information and technology" about these services, he was told the paper has concerns about the reliability of the services. Hoyt also wrote that "they would present major workflow issues because they tend to turn up many false-positive results, like quotes from newsmakers rendered verbatim in many places." News organizations are, of course, hesitant to introduce anything new into their processes that will take up more time and therefore slow down the news. They currently see these services as a time suck on editors, and think the delay isn't worth the results.
  3. Catch-22 In basic terms, these services compare a work against web content and/or against works contained in a database of previously published material. (Many services only check against web content.) Major news databases such as Nexis and Factiva are not part of the proprietary plagiarism detection databases, which means the sample group is not ideal for news organizations. As a result, news organizations complain that the services will miss potential incidents of plagiarism. But here's the flip side: If they signed up with these services and placed their content in the database, it would instantly improve the quality of plagiarism detection. Their unwillingness to work with the services is a big reason why the databases aren't of better quality.
  4. Complicated contracts The Hartford Courant used the iThenticate service a few years ago to check op-ed pieces. "It's worth the cost," Carolyn Lumsden, the paper's commentary editor, told American Journalism Review back in 2004. "It doesn't catch absolutely everything, but it catches enough that you're alerted if there's a problem." When I followed up with her a few weeks back, she told me the paper had ended its relationship with the company. "We had a really good deal with them ... But then iThenticate wanted us to sign a complicated multipage legal contract. Maybe that's what they do with universities (their primary client). But we weren't interested in such entanglements."

The Strange Double Standard

So, as a result of the above concerns, almost no traditional news organizations use a plagiarism detection service to check their work either before or after publication. (On the other hand, Demand Media, a company that has attracted a lot of criticism for its lack of quality content and low pay, is a customer of iThenticate.) Here's the strange twist: Many of these same organizations are happy to invest the money, time and other resources required to use services that check if someone else is plagiarizing their work.

bailey.jpg

Jonathan Bailey, the creator of Plagiarism Today and president of CopyByte, a consultancy that helps advise organizations about detecting and combating plagiarism, said he's aware of many media companies that pay to check and see if anyone's stealing their content.

"It's fascinating because one of the companies I work with is Attributor ... and I'm finding lots of newspapers and major publications are on board with [that service], but they are not using it to see if the content they're receiving is original," he said. "It's a weird world for me in that regard. A lot of news organizations are interested in protecting their work from being stolen, but not in protecting themselves from using stolen work."

(MediaShift will publish a follow-up article tomorrow that looks at Attributor.)

How they Work

Bailey compares these services to search engines. Just as Google will take a query and check it against its existing database of web content, plagiarism detection services check a submitted work against an existing database of content.

"They work fundamentally with the same principles as search engines," he said. "They all take data from various sources and fingerprint it and compress it and store it in a database. When they find potential matches, they do a comparison."

Each service has its own algorithm to locate and compare content, and they also differ in terms of the size of their databases. Many of the free or less expensive services only search the current web. That means they don't compare material against archived web pages or proprietary content databases.

Bailey said that another big difference between services is the amount of work they require a person to undertake in order examine any potential matches. (This is the concern voiced by the editor at the New York Times.) Some services return results that list a series of potential matches, but don't explain which specific sentences or paragraphs seem suspect. This causes a person to spend time eliminating false positives.

ithen.jpgBailey also said some of the web-only services are also unable to distinguish between content that is properly quoted or attributed, and material that is stolen. This, too, can waste a person's time. However, he said that iThenticate, for example, does a decent job of eliminating the more obvious false positives, and that it has an API that enables it to be integrated into any web-based content management system.

Where They're Most Effective

Bailey has used and tested a wide variety of the plagiarism detection services available, and said they vary widely in terms of quality. Along with his experience, Dr. Debora Weber-Wulff, a professor of Media and Computing at Hochschule für Technik und Wirtschaft Berlin, has conducted two tests of these services. Her 2007 study is available here, and Bailey also wrote about her 2008 research on his website.

Asked to summarize the effectiveness of these services, Dr. Weber-Wulff offered a blunt
reply by email: "Not at all."

Weber-Wulff_2008.jpg

"They don't detect plagiarism at all," she wrote. "They detect copies, or near copies. And they are very bad at doing so. They miss a lot, since they don't test the entire text, just samples usually. And they don't understand footnoting conventions, so they will flag false positives."

Her tests involved taking 31 academic papers that included plagiarized elements and running them through different services. Her data is important and worth looking at, though journalists should note that academic papers and articles are not going to elicit the same results. The Hartford Courant seemed happy with its service, as was the Fort Worth Star-Telegram when it used one a few years ago, according to Clark Hoyt's blog post. On the other hand, the New York Times continues to have concerns.

For his part, Bailey mentioned a few services that might work for journalism.

"IThenticate do a very good job, they provide a good service but it is pricey," he said, "and it is very difficult to motivate newspapers to sign up when they're putting second mortgages on their buildings."

He also mentioned Copyscape.

"Copyscape is another one that is very popular and it's very cheap at only 5 cents a search," he said, noting it took the top spot in Dr. Weber-Wulff's latest study. "It's very good at matching -- it uses Google -- and it does a thorough job, though the problem is that it only searches the current web, so you have a limited database."

He recommends using Copyscape if the plan is to perform spot checks for a news organization. Bailey also mentioned SafeAssign as a third offering to look at.


In his view, it's unacceptable that news organizations are ignoring these services.

"The risks of not running a [plagiarism] check are incredibly high," he said, citing the damage that can be done to a brand, and the potential for lawsuits. "At the very least, they should be doing post-publication checking rather than letting the public-at-large or competitors do it for them -- because that's when things get particularly ugly.

Craig Silverman is an award-winning journalist and author, and the managing editor of MediaShift and Idea Lab. He is founder and editor of Regret the Error, the author of Regret the Error: How Media Mistakes Pollute the Press and Imperil Free Speech, and a weekly columnist for Columbia Journalism Review. Follow him on Twitter at @CraigSilverman.

This is a summary. Visit our site for the full post ».

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl