Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 23 2013


More on the tech side of the Reuters.com redesign

Building on our piece on the Reuters.com rethink, Source went back to get the nerdier details from Paul Smalera. Of note is that it’s all built on a unified API called Media Connect that generates the content feeds for all its new platforms and products:

The other thing this setup lets us do is show off the depth of Reuters content. We produce, including videos, text, pictures, and other content types, something like 20,000 unique items per day. But our current website really didn’t let us show off the depth of our reporting. So one of the main functions of the CMS is really set up to allow editors to create and curate collections — we call them streams — of stories. This lets us get to the endless-scroll type behavior that Facebook, Tumblr, Twitter, and the rest have made popular as the new default behavior of the web.

September 04 2012


LocalWiki Releases First API, Enabling Innovative Apps

We're excited to announce that the first version of the LocalWiki API has just been released!

What's this mean?

In June, folks in Raleigh, N.C., held their annual CityCamp event. CityCamp is a sort of "civic hackathon" for Raleigh. During one part of the event, people broke up into teams and came up with projects that used technology to help solve local, civic needs.


What did almost every project pitched at CityCamp have in common? "Almost every final CityCamp idea had incorporated a stream of content from TriangleWiki," CityCamp and TriangleWiki organizer Reid Seroz said in an interview with Red Hat's Jason Hibbets.

The LocalWiki API makes it really easy for people to build applications and systems that push and pull information from a LocalWiki. In fact, the API has already been integrated into a few applications. LocalWiki is an effort to create community-owned, living information repositories that will provide much-needed context behind the people, places, and events that shape our communities.

The winning project at CityCamp Raleigh, RGreenway, is a mobile app that helps residents find local greenways. They plan to push/pull data from the TriangleWiki's extensive listing of greenways.

Another group in the Raleigh-Durham area, Wanderful, is developing a mobile application that teaches residents about their local history as they wander through town. They're using the LocalWiki API to pull pages and maps from the TriangleWiki.

Ultimately, we hope that LocalWiki can be thought of as an API for the city itself -- a bridge between local data and local knowledge, between the quantitative and the qualitative aspects of community life.

Using the API

You can read the API documentation to learn about the new API. You'll also want to make sure you check out some of the API examples to get a feel for things.


We did a lot of work to integrate advanced geospatial support into the API, extending the underlying API library we were using -- and now everyone using it can effortlessly create an awesome geospatially aware API.

This is just the first version of the API, and there's a lot more we want to do! As we add more structured data to LocalWiki, the API will get more and more useful. And we hope to simplify and streamline the API as we see real-world usage.

Want to help? Share your examples for interacting with the API from a variety of environments -- jump in on the page on dev.localwiki.org or add examples/polish to the administrative documentation.

CityCamp photo courtesy of CityCamp Raleigh.

Philip Neustrom is a software engineer in the San Francisco Bay area. He co-founded DavisWiki.org in 2004 and is currently co-directing the LocalWiki.org effort. For the past several years he has worked on a variety of non-profit efforts to engage everyday citizens. He oversaw the development of the popular VideoTheVote.org, the world's largest coordinated video documentation project, and was the lead developer at Citizen Engagement Laboratory, a non-profit focused on empowering traditionally underrepresented constituencies. He is a graduate of the University of California, Davis, with a bachelor's in Mathematics.

August 24 2012


This Week in Review: Twitter’s ongoing war with developers, and plagiarism and online credibility

[Since the review was off last week, this week's review covers the last two weeks.]

More Twitter restrictions for developers: Twitter continued to tighten the reins on developers building apps and services based on its platform with another change to its API rules last week. Most of it is pretty incomprehensible to non-developers, but Twitter did make itself plain at one point, saying it wants to limit development by engagement-based apps that market to consumers, rather than businesses. (Though a Twitter exec did clarify that at least two of those types of services, Storify and Favstar, were in the clear.)

The Next Web’s Matthew Panzarino clarified some of the technical jargon, and Marketing Land’s Danny Sullivan explained whom this announcement means Twitter likes and doesn’t like, and why. ReadWriteWeb’s Dan Frommer gave the big-picture reason for Twitter’s increasing coldness toward developers — it needs to generate tons more advertising soon if it wants to stay independent, and the way to do that is to keep people on Twitter, rather than on Twitter-like apps and services. (Tech entrepreneur Nova Spivack said that rationale doesn’t fly, and came up with a few more open alternatives to allow Twitter to make significant money.)

That doesn’t mean developers were receptive of the news, though. Panzarino said these changes effectively kill the growth of third-party products built on Twitter’s platform, and Instapaper founder Marco Arment argued that Twitter has made itself even harder to work with than the famously draconian Apple. Eliza Kern and Mathew Ingram of GigaOM talked to developers about their ambivalence with Twitter’s policies and put Twitter’s desire for control in perspective, respectively.

Several observers saw these changes as a marker of Twitter’s shift from user-oriented service to cog in the big-media machine. Tech designer Stowe Boyd argued Twitter “is headed right into the central DNA of medialand,” and tech blogger Ben Brooks said Twitter is now preoccupied with securing big-media partnerships: “Twitter has sold out. They not only don’t care about the original users, but they don’t even seem to care much for the current users — there’s a very real sense that Twitter needs to make money, and they need to make that money yesterday.” Developer Rafe Colburn pointed out how many of Twitter’s functions were developed by its users, and developer Nick Bruun said many of the apps that Twitter is going after don’t mimic its user experience, but significantly improve it. Killing those apps and streamlining the experience, said GigaOM’s Mathew Ingram, doesn’t help users, but hurts them.

Part of the problem, a few people said, was Twitter’s poor communication. Harry McCracken of Time urged Twitter to communicate more clearly and address its users alongside its developers. Tech entrepreneur Anil Dash offered a rewritten (and quite sympathetic) version of Twitter’s guidelines.

There’s another group of developers affected by this change — news developers. The Lab’s Andrew Phelps surveyed what the changes will entail for various Twitter-related news products (including a couple of the Lab’s own), and journalism professor Alfred Hermida warned that they don’t bode well for the continued development of open, networked forms of journalism.

Plagiarism, credibility, and the web: Our summer of plagiarism continues unabated: Wired decided to keep Jonah Lehrer on as a contributor after plagiarism scandal, though the magazine said it’s still reviewing his work and he has no current assignments. Erik Wemple of The Washington Post lamented the lack of consequences for Lehrer’s journalistic sins, and both he and Poynter’s Craig Silverman wondered how the fact-checking process for his articles would go. Meanwhile, Lehrer was accused by another source of fabricating quotes and also came under scrutiny for mischaracterizing scientific findings.

The other plagiarizer du jour, Time and CNN’s Fareed Zakaria, has come out much better than Lehrer so far. Zakaria resigned as a Yale trustee, but Time, CNN, and The Washington Post (for whom he contributes columns) all reinstated him after reviewing his work for them, with Time declaring it was satisfied that his recent lapse was an unintentional error. However, a former Newsweek editor said he ghost-wrote a piece for Zakaria while he was an editor there, though he told the New York Observer and Poynter that he didn’t see it as a big deal.

Some defended Zakaria on a variety of grounds. Poynter’s Andrew Beaujon evaluated a few of the arguments and found only one might have merit — that the plagiarism might have resulted from a research error by one of his assistants. The Atlantic’s Robinson Meyer, meanwhile, argued that plagiarism has a long and storied history in American journalism, but hasn’t always been thought of as wrong.

Others saw the responses by news organizations toward both Zakaria and Lehrer as insufficient. Poynter’s Craig Silverman argued that those responses highlighted a lack of consistency and transparency (he and Kelly McBride also wrote a guide for news orgs on how to handle plagiarism), while journalism professor Mark Leccese said Zakaria’s employers should have recognized the seriousness of plagiarism and gone further, and Steven Brill at the Columbia Journalism Review called for more details about the nature of Zakaria’s error.

A New York Times account of Zakaria’s error focused on his hectic lifestyle, filled with the demands of being a 21st-century, multiplatform, personally branded pundit. At The Atlantic, book editor and former journalist Peter Osnos focused on that pressure for a pundit to publish on all platforms for all people as the root of Zakaria’s problem.

The Times’ David Carr pinpointed another factor — the availability of shortcuts to credibility on the web that allowed Lehrer to become a superstar before he learned the craft. (Carr found Lehrer’s problems far more concerning than Zakaria’s.) At Salon, Michael Barthel also highlighted the difference between traditional media and web culture, arguing that the problem for people like Zakaria is their desire to inhabit both worlds at once: “The way journalists demonstrate credibility on the Web isn’t better than how they do in legacy media. It’s just almost entirely different. For those journalists and institutions caught in the middle, that’s a real problem.” GigaOM’s Mathew Ingram argued that linking is a big part of the web’s natural defenses against plagiarism.

Untruths and political fact-checking: The ongoing discussion about fact-checking and determining truth and falsehood in political discourse got some fresh fuel this week with a Newsweek cover story by Harvard professor Niall Ferguson arguing for President Obama’s ouster. The piece didn’t stand up well to numerous withering fact-checks (compiled fairly thoroughly by Newsweek partner The Daily Beast and synthesized a bit more by Ryan Chittum of the Columbia Journalism Review).

Ferguson responded with a rebuttal in which he argued that his critics “claim to be engaged in ‘fact checking,’ whereas in nearly all cases they are merely offering alternative (often silly or skewed) interpretations of the facts.” Newsweek’s editor, Tina Brown, likewise referred to the story as opinion (though not one she necessarily agreed with) and said there isn’t “a clear delineation of right and wrong here.”

Aside from framing the criticism as a simple difference of opinion rather than an issue of factual (in)correctness, Newsweek also acknowledged to Politico that it doesn’t have fact-checkers — that its editors “rely on our writers to submit factually accurate material.”  Poynter’s Craig Silverman provided some of the history behind that decision, which prompted some rage from Charles Apple of the American Copy Editors Society. Apple asserted that any news organization that doesn’t respect its readers or public-service mission enough to ensure their work is factually accurate needs to leave the business. The Atlantic’s Ta-Nehisi Coates said the true value of fact-checkers comes in the culture of honesty they create.

Mathew Ingram of GigaOM wondered if that fact-checking process might be better done in public, where readers can see the arguments and inform themselves. In an earlier piece on campaign rhetoric, Garance Franke-Ruta of The Atlantic argued that in an era of willful, sustained political falsehood, fact-checking may be outliving its usefulness, saying, “One-off fact-checking is no match for the repeated lie.” The Lab’s Andrew Phelps, meanwhile, went deep inside the web’s leading fact-checking operation, PolitiFact.

The Times’ new CEO and incremental change: The New York Times Co. named a new CEO last week, and it was an intriguing choice — former BBC director general Mark Thompson. The Times’ article on Thompson focused on his digital expansion at the BBC (which was accompanied by a penchant for cost-cutting), as well as his transition from publicly funded to ad-supported news. According to the International Business Times, those issues were all sources of skepticism within the Times newsroom. Bloomberg noted that Thompson will still be subject to Arthur Sulzberger’s vision for the Times, and at the Guardian, Michael Wolff said Thompson should complement that vision well, as a more realistic and business-savvy counter to Sulzberger.

The Daily Beast’s Peter Jukes pointed out that many of the BBC’s most celebrated innovations during Thompson’s tenure were not his doing. Robert Andrews of paidContent also noted this, but said Thompson’s skill lay in being able to channel that bottom-up innovation to fit the BBC’s goals. Media analyst Ken Doctor argued that the BBC and the Times may be more alike than people think, and Thompson’s experience at the former may transfer over well to the latter: “Thompson brings the experience at moving, too slowly for some, too dramatically for others, a huge entity.” But Mathew Ingram of GigaOM said that kind of approach won’t be enough: “The bottom line is that a business-as-usual or custodial approach is not going to cut it at the NYT, not when revenues are declining as rapidly as they have been.”

Joe Pompeo of Capital New York laid out a thorough description of the Sulzberger-led strategy Thompson will be walking into: Focusing on investment in the Times, as opposed to the company’s other properties, but pushing into mobile, video, social, and global reach, rather than print. And Bloomberg’s Edmund Lee posited the idea that the Times could be in increasingly good position to go private.

The Assange case and free speech vs. women’s rights: WikiLeaks’ Julian Assange cleared another hurdle last week — for now — in his fight to avoid extradition to Sweden on sexual assault accusations when Ecuador announced it would grant him asylum. Assange has been staying in the Ecuadorean Embassy in London for two months, but British officials threatened to arrest Assange in the embassy. Ecuador’s decision gives him immunity from arrest on Ecuadorean soil (which includes the embassy).

Assange gave a typically defiant speech for the occasion, but the British government was undeterred, saying it plans to resolve the situation diplomatically and send Assange to Sweden. Ecuador’s president said an embassy raid would be diplomatic suicide for the U.K., and Techdirt’s Mike Masnick was appalled that Britain would even suggest it. Filmmakers Michael Moore and Oliver Stone argued in The New York Times that Assange deserves support as a free-speech advocate, while Gawker’s Adrian Chen said the sexual assault case has nothing to do with free speech. Laurie Penny of The Independent looked at the way free speech and women’s rights are being pitted against each other in this case. Meanwhile, Glenn Greenwald of The Guardian excoriated the press for their animosity toward Assange.

Reading roundup: We’ve already covered a bunch of stuff over the past week and a half, and there’s lots more to get to, so here’s a quick rundown:

— Twitter and Blogger co-founder Evan Williams announced the launch of Medium, a publishing platform that falls somewhere between microblogging and blogging. The Lab’s Joshua Benton has the definitive post on what Medium might be, Dave Winer outlined his hopes for it, and The Awl’s Choire Sicha wrote about the anti-advertising bent at sites like it.

— A few social-news notes: Two features from the Huffington Post and the Lab on BuzzFeed’s ramped-up political news plans; TechCrunch’s comparison of BuzzFeed, Reddit, and Digg; and a feature from the Daily Dot on Reddit and the future of social journalism.

— The alt-weekly The Village Voice laid off staffers late last week, prompting Jim Romenesko to report that the paper is on the verge of collapse and Buzzfeed’s Rosie Gray to chronicle its demise. Poynter’s Andrew Beaujon said the paper still has plenty left, and The New York Times’ David Carr said the problem is that the information ecosystem has outgrown alt-weeklies.

— Finally, three great food-for-thought pieces, Jonathan Stray here at the Lab on determining proper metrics for journalism, media consultant Mark Potts on a newspaper exec’s 20-year-old view of the web, and Poynter’s Matt Thompson on the role of the quest narrative in journalism.

Photo of Jonah Lehrer by PopTech and drawing of Julian Assange by Robert Cadena used under a Creative Commons license.

August 07 2012


Archiving tweets isn’t as easy as it seems

Between deleted tweets and posts that disappear into the timeline void, it's a pain to keep track of and find information more than a few days old on Twitter. So how hard could it really be to build a capable Twitter archiver for reporters? Read More »

April 27 2012


January 05 2012


Feed Your PANDA With New APIs and Excel Import


Last time I wrote it was to solicit ideas for PANDA's API. We've since implemented those ideas, and we've just released our third alpha, which includes a complete writable API, demo scripts showing how to import from three different data sources, and the ability to import data from Excel spreadsheets.

The PANDA project aims to make basic data analysis quick and easy for news organizations, and make data sharing simple.

Try Alpha 3 now.

Hello, Write API

Our new write API is designed to be as simple and consistent as possible. We've gone to great lengths to illustrate how it works in our new API documentation. We've also provided three example scripts showing how to populate PANDA with data from:

Using these scripts as a starting point, any programmer with a little bit of Python knowledge should be able to easily import data from an SQL database, local file or any other arcane data source they can conjure up in the newsroom.

Excel support

Also included in this release is support for importing from Excel .xls and .xlsx files. It's difficult to promise that this support will work for every Excel file anyone can find to throw at it, but we've had good results with files produced from both Windows and Mac versions of Excel, as well as from OpenOffice on Mac and Linux.

Our Alpha 4 release will be coming at the end of January, followed quickly by Beta 1 around the time of NICAR. To see what we have planned, check out our Release schedule.

December 08 2011


PANDA Project Releases Alpha 2 (and Needs Your API Ideas!)

Last Friday, we closed out our eighth iteration of PANDA Project development and published our second alpha. We've added a login/registration system, dataset search, complex query support and a variety of other improvements. You can try out the new release now by visiting our test site here.


The PANDA project aims to make basic data analysis quick and easy for news organizations, and make data sharing simple. We've incorporated much of the feedback we got in response to the first release, though some significant features, such as support for alternative file formats, have been intentionally put off while we focus on building core functionality. We will have these in place before we release our first beta at the National Institute for Computer-Assisted Reporting conference in February.

As always, you can report bugs on our Github Issue tracker or email your comments to me directly.

Building a complete API

PANDA is built on a robust API so that it can be extended without modification. This is a tried-and-true way to design web applications, but it's also a permanent hedge against the project becoming bloated or obsolete. By making PANDA extensible, we encourage other developers to add features that they need, but which may not fit our vision of what belongs in the core offering -- user-facing content, for example. Ideally, this will lead to a community of expert users who sustain the project after our grant is finished.

Over the next month, I'll be adding the trickiest, but most exciting, part of the API: a mechanism for programatically adding data to the system. Once this is complete, developers will be able to write scripts which insert new data into their PANDA instance and have it immediately be made searchable for reporters. Here are some example use-cases:

This last use-case is particularly exciting. One feature we have on the roadmap is to investigate how we can integrate directly with ScraperWiki. This is speculative at the moment, but has the potential to make the API useful even to novice developers who might not be entirely comfortable writing shell scripts or cron jobs.

I'm really excited to be building out this feature. If you've got ideas for how you might use it or use-cases you want to make sure we support, let me know!

Image courtesy of Flickr user woychuk.

December 07 2011


How to scrape and parse Wikipedia

Today’s exercise is to create a list of the longest and deepest caves in the UK from Wikipedia. Wikipedia pages for geographical structures often contain Infoboxes (that panel on the right hand side of the page).

The first job was for me to design an Template:Infobox_ukcave which was fit for purpose. Why ukcave? Well, if you’ve got a spare hour you can check out the discussion considering its deletion between the immovable object (American cavers who believe cave locations are secret) and the immovable force (Wikipedian editors who believe that you can’t have two templates for the same thing, except when they are in different languages).

But let’s get on with some Wikipedia parsing. Here’s what doesn’t work:

import urllib
print urllib.urlopen("http://en.wikipedia.org/wiki/Aquamole_Pot").read()

because it returns a rather ugly error, which at the moment is: “Our servers are currently experiencing a technical problem.”

What they would much rather you do is go through the wikipedia api and get the raw source code in XML form without overloading their servers.

To get the text from a single page requires the following code:

import lxml.etree
import urllib

title = "Aquamole Pot"

params = { "format":"xml", "action":"query", "prop":"revisions", "rvprop":"timestamp|user|comment|content" }
params["titles"] = "API|%s" % urllib.quote(title.encode("utf8"))
qs = "&".join("%s=%s" % (k, v)  for k, v in params.items())
url = "http://en.wikipedia.org/w/api.php?%s" % qs
tree = lxml.etree.parse(urllib.urlopen(url))
revs = tree.xpath('//rev')

print "The Wikipedia text for", title, "is"
print revs[-1].text

Note how I am not using urllib.urlencode to convert params into a query string. This is because the standard function converts all the ‘|’ symbols into ‘%7C’, which the Wikipedia api site doesn’t accept.

The result is:

{{Infobox ukcave
| name = Aquamole Pot
| photo =
| caption =
| location = [[West Kingsdale]], [[North Yorkshire]], England
| depth_metres = 113
| length_metres = 142
| coordinates =
| discovery = 1974
| geology = [[Limestone]]
| bcra_grade = 4b
| gridref = SD 698 784
| location_area = United Kingdom Yorkshire Dales
| location_lat = 54.19082
| location_lon = -2.50149
| number of entrances = 1
| access = Free
| survey = [http://cavemaps.org/cavePages/West%20Kingsdale__Aquamole%20Pot.htm cavemaps.org]
'''Aquamole Pot''' is a cave on [[West Kingsdale]], [[North Yorkshire]],
England wih which was first discovered from the
bottom by cave diving through 550 feet of
sump from [[Rowten Pot]] in 1974....

This looks pretty structured. All ready for parsing. I’ve written a nice complicated recursive template parser that I use in wikipedia_utils, which makes it easy to extract all the templates from the page in the following way:

import scraperwiki
wikipedia_utils = scraperwiki.swimport("wikipedia_utils")

title = "Aquamole Pot"

val = wikipedia_utils.GetWikipediaPage(title)
res = wikipedia_utils.ParseTemplates(val["text"])
print res               # prints everything we have found in the text
infobox_ukcave = dict(res["templates"]).get("Infobox ukcave")
print infobox_ukcave    # prints just the ukcave infobox

This now produces the following Python data structure that is almost ready to push into our database — after we have converted the length and depths from strings into numbers:

{0: 'Infobox ukcave', 'number of entrances': '1',
 'location_lon': '-2.50149',
 'name': 'Aquamole Pot', 'location_area': 'United Kingdom Yorkshire Dales',
 'geology': '[[Limestone]]', 'gridref': 'SD 698 784', 'photo': '',
 'coordinates': '', 'location_lat': '54.19082', 'access': 'Free',
 'caption': '', 'survey': '[http://cavemaps.org/cavePages/West%20Kingsdale__Aquamole%20Pot.htm cavemaps.org]',
 'location': '[[West Kingsdale]], [[North Yorkshire]], England',
 'depth_metres': '113', 'length_metres': '142', 'bcra_grade': '4b', 'discovery': '1974'}

Right. Now to deal with the other end of the problem. Where do we get the list of pages with the data?

Wikipedia is, unfortunately, radically categorized, so Aquamole_Pot is inside Category:Caves_of_North_Yorkshire, which is in turn inside Category:Caves_of_Yorkshire which is then inside
Category:Caves_of_England which is finally inside

So, in order to get all of the caves in the UK, I have to iterate through all the subcategories and all the pages in each category and save them to my database.

Luckily, this can be done with:

lcavepages = wikipedia_utils.GetWikipediaCategoryRecurse("Caves_of_the_United_Kingdom")
scraperwiki.sqlite.save(["title"], lcavepages, "cavepages")

All of this adds up to my current scraper wikipedia_longest_caves that extracts those infobox tables from caves in the UK and puts them into a form where I can sort them by length to create this table based on the query SELECT name, location_area, length_metres, depth_metres, link FROM caveinfo ORDER BY length_metres desc:

name location_area length_metres depth_metres Ease Gill Cave System United Kingdom Yorkshire Dales 66000.0 137.0 Dan-yr-Ogof Wales 15500.0 Gaping Gill United Kingdom Yorkshire Dales 11600.0 105.0 Swildon’s Hole Somerset 9144.0 167.0 Charterhouse Cave Somerset 4868.0 228.0

If I was being smart I could make the scraping adaptive, that is only updating the pages that have changed since the last scraped by using all the data returned by GetWikipediaCategoryRecurse(), but it’s small enough at the moment.

So, why not use DBpedia?

I know what you’re saying: Surely the whole of DBpedia does exactly this, with their parser?

And that’s fine if you don’t want your updates to come less than 6 months, which prevents you from getting any feedback when adding new caves into Wikipedia, like Aquamole_Pot.

And it’s also fine if you don’t want to be stuck with the naïve semantic web notion that the boundaries between entities is a simple, straightforward and general concept, rather than what it really is: probably the one deep and fundamental question within any specific domain of knowledge.

I mean, what is the definition of a singular cave, really? Is it one hole in the ground, or is it the vast network of passages which link up into one connected system? How good do those connections have to be? Are they defined hydrologically by dye tracing, or is a connection defined as the passage of one human body getting itself from one set of passages to the next? In the extreme cases this can be done by cave diving through an atrocious sump which no one else is ever going to do again, or by digging and blasting through a loose boulder choke that collapses in days after one nutcase has crawled through. There can be no tangible physical definition. So we invent the rules for the definition. And break them.

So while theoretically all the caves on Leck Fell and Easgill have been connected into the Three Counties System, we’re probably going to agree to continue to list them as separate historic caves, as well as some sort of combined listing. And that’s why you’ll get further treating knowledge domains as special cases.

September 18 2011


First step - Google+ APIs now available to developers: what people shared publicly

Google's initial API release is focused onpublic data only — it lets you read information that people have shared publicly on Google+ (read-only). The app calls are limited in what the search engine calls courtesy usage quota.

May 16 2011


It’s SQL. In a URL.

Squirrelled away amongst the other changes to ScraperWiki’s site redesign, we made substantial improvements to the external API explorer.

We’re going to concentrate on the SQLite function here as it is most import, but as you can see on the right there are other functions for getting out scraper metadata.

Zarino and Julian have made it a little bit slicker to find out the URLs you can use to get your data out of ScraperWiki.

1. As you type into the name field, ScraperWiki now does an incremental search to help you find your scraper, like this.

2. After you select a scraper, it shows you its schema. This makes it much easier to know the names of the tables and columns while writing your query.

3. When you’ve edited your SQL query, you can run it as before. There’s also now a button to quickly and easily copy the URL that you’ve made for use in your application.

You can get to the explorer with the “Explore with ScraperWiki API” button at the top of every scraper’s page. This makes it quite useful for quick and dirty queries on your data, as well as for finding the URLs for getting data into your own applications.

Let us know when you do something interesting with data you’ve sucked out of ScraperWiki!

April 11 2011


Scrape it – Save it – Get it

I imagine I’m talking to a load of developers. Which is odd seeing as I’m not a developer. In fact, I decided to lose my coding virginity by riding the ScraperWiki digger! I’m a journalist interested in data as a beat so all I need to do is scrape. All my programming will be done on ScraperWiki, as such this is the only coding home I know. So if you’re new to ScraperWiki and want to make the site a scraping home-away-from-home, here are the basics for scraping, saving and downloading your data:

With these three simple steps you can take advantage of what ScraperWiki has to offer – writing, running and debugging code in an easy to use editor; collaborative coding with chat and user viewing functions; a dashboard with all your scrapers in one place; examples, cheat sheets and documentation; a huge range of libraries at your disposal; a datastore with API callback; and email alerts to let you know when your scrapers break.

So give it a go and let us know what you think!

March 31 2011


March 02 2011


Signals of churnalism?

Journalism warning labels

Journalism warning labels by Tom Scott

On Friday I had quite a bit of fun with Churnalism.com, a new site from the Media Standards Trust which allows you to test how much of a particular press release has been reproduced verbatim by media outlets.

The site has an API, which got me thinking whether you might be able to ‘mash’ it with an RSS feed from Google News to check particular types of articles – and what ‘signals’ you might use to choose those articles.

I started with that classic PR trick: the survey. A search on Google News for “a survey * found” (the * is a wildcard, meaning it can be anything) brings some interesting results to start investigating.

Jon Bounds added a favourite of his: “hailed a success”.

And then it continued:

  • “Research commissioned by”
  • “A spokesperson said”
  • “Can increase your risk of” and “Can reduce your risk of”

On Twitter, Andy Williams added the use of taxonomies of consumers – although it was difficult to pin that down to a phrase. He also added “independent researchers

Contributors to the MySociety mailing list added:

  • “Proud to announce”
  • “Today launches”
  • “Revolutionary new”
  • “It was revealed today” (Andy Mabbett)
  • “According to research”, “research published today” and “according to a new report”

And of course there is “A press release said”.

Signal – or sign?

The idea kicked off a discussion on Twitter on whether certain phrases were signals of churnalism, or just journalistic cliches. The answer, of course, is both.

By brainstorming for ‘signals’ I wasn’t arguing that any material using these phrases would be guilty of churnalism – or even the majority – just that they might be represent one way of narrowing your sample. Once you have a feed of stories containing “Revolutionary new” you can then use the API to test what proportion of those articles are identical to the text in a press release – or another news outlet.

The signal determines the sample, the API calculates the results.

Indeed, there’s an interesting research project to be done – perhaps using the Churnalism API – on whether the phrases above are more likely to contain passages copied wholesale from press releases, than a general feed of stories from Google News.

(Another research project might involve looking at press releases to identify common phrases used by press officers that might be used by the API)

You may have another opinion of course – or other phrases you might suggest?

January 20 2011


Social Actions API, Semantic Web, and Linked Open Data: An Interview with Peter Deitz

Peter Deitz is a long-time member and contributor in the NetSquared community; he started the NetSquared Montreal group and his Social Actions project was a winner in the 2008 N2Y3 Mashup Challenge. Over the last few years, we have watched and supported the growth of Social Actions, including partnering for the Change the Web Challenge in 2009 - a Partner Challenge designed to tap into the NetSquared Community to find innovative ways of using the Social Actions API and data stream. We are really excited about the latest developments to the Social Actions API and the larger implications of what these updates mean for powering open data and supporting action around the world. To learn more about it, I caught up with Peter earlier this week to get all the details and am excited to share them here first!

Hear from Peter Deitz about the Social Actions API!

Let’s start at the beginning: What is Social Actions and where does the API come in?

I describe Social Actions as an aggregation of actions people can take on any issue that’s built to be highly distributable across the social web. We pull in donation opportunities, volunteer positions, petitions, event, and other actions from 60+ different sources. That’s today. A few years ago, we had just a handful of pioneering platforms in microphilanthropy.

The Social Actions project began in 2006. I wanted to make some kind of contribution to the world of microphilanthropy. My intent was to inventory every interesting action I came across to make it easier for people to engage in the causes they cared about. There wasn’t much scalability in the way I was pursuing the project.

In 2007, I realized that a much more effective way to aggregate interesting actions would be to subscribe to RSS feeds from trusted sources. I wrote about the potential for aggregating RSS feeds of giving opportunities in a blog post called, Why We Need Group Fundraising RSS Feeds. Three months later I had a prototype platform aggregating actions from RSS feeds, with a search element around that content.

Around  the time of the Nonprofit Technology Network’s 2008 NTC conference, an even brighter light bulb went on. I remember sitting in a session by Kurt Voelker of ForumOne Communications, Tompkins Spann of Convio, and Jeremy Carbaugh of The Sunlight Foundation. They were talking about API’s. (API stands for Application Programming Interface, and refers broadly to the way one piece of software or dataset communicates with another.) In fact, the name of the session was "APIs for Beginners."

I knew I wanted to be in the session even without really knowing why. It was there that I realized my RSS-based process for aggregating actions could be so much more with a robust distribution component. I wrote a blog post called, Mashups, Open APIs, and the Future of Collaboration in the Nonprofit Tech Sector. I left that session knowing exactly the direction I wanted to take Social Actions.

And what would you describe as the social definition of Social Actions API - the purpose?

There’s a groundswell in interest, on the part of “non-nonprofit professionals,” to engage with social movements and causes. It’s well-documented at this point that people are hungry to engage with causes they care about in various forms.

The premise behind Social Actions is that there are enough actions floating around on the web that nonprofits produce, but that they’re not linked up properly or adequately syndicated. There are a million opportunities to take action on a cause you care about, but it’s not easy to find them. The Social Actions API attempts to address the distribution and syndication challenge while also encouraging nonprofits to make their actions more readily available.

What were the limitations that Social Actions and its API were hitting up against before the recent updates?

We have encountered a number of challenges over the years. Originally, adding actions manually. was difficult. That challenge was resolved by creating a platform that used RSS feeds to pull in opportunities,  which in turn evolved into the Social Actions API, allowing people to access the full dataset from any application that connected to it.

The vast majority of applications that have been built since 2008 match actions with related content: for example, by reading a blog post and searching the Social Actions dataset for related actions. The quality of the search results were limited by our querying capabilities and relevancy ranking. The results we were able to produce didn’t reflect the full contents of our database. They tended to reflect only the most recently-added actions, not the most relevant. As a result, we weren’t equipping developers with a platform that allowed for more accurate location- and issue-based searches. Until the recent enhancements, producing the best possible search results for a given phrase or keyword was a biggest challenge.

What did the recent updates accomplish, and how did the opportunity to make them come about?

The updates introduce Semantic Analysis and Natural Language Processing (NLP) capabilities to the Social Actions API and begin to connect Social Actions to the wider Linked Open Data community.

The enhancements effectively put Social Actions back on the cutting edge of social technology. These were changes that we had wanted to make for a long time. In Spring 2009, we were approached by a group that was building an advanced video + action platform and that wanted to draw on the Social Actions API. Link TV, in prototyping their ViewChange platform, noticed that the Social Actions API wasn’t producing the best possible results. They invited us to explore with them what would be involved in updating our platform so that ViewChange could feature more relevant results.

Link TV helped us articulate the changes that would need to occur and then connected us with a funder who could underwrite what amounted to a very significant enhancement to our code base. In one month, we had approximately as large an investment in the technology as we’d had in total up until that point. It has been incredibly exciting to see how open source projects like Social Actions tend to grow in fits and bursts, depending on the demands and resources made available by users.

What do “Semantic Analysis” and “Natural Language Processing” mean, and how do they make the Social Actions API better?

Semantic Analysis and Natural Language Processing both have to do with the process of identifying the meaning of a collection of words together. Semantic analysis, for example, can help to identify the meaning of a phrase like “poverty relief” as distinct from what “poverty” and “relief” mean independently. The Social Actions API now uses a tool called Zemanta to apply these processes when searching the actions contained in the dataset. As a result, we can say with more confidence what an action is about and where it is taking place. When searching for the phrase “poverty relief,” for example, not only are the search results more accurate, but Zemanta helps us to identify other actions that might not in fact use that phrase but are nonetheless linked in meaning to it. It’s a difficult concept to explain, but hopefully this makes sense.

And what does “Linked Open Data” refer to?

Just like in 2008 when I had an “aha moment” about APIs, in June 2009 I had an “aha moment” about Linked Open Data. I was presenting Social Actions at the Semantic Technology Conference (SemTech), describing how Social Actions was an open database and how we encouraged developers to build open source applications that distributed this data widely. Ivan Herman from W3C listened to the presentation asked, “Why are you building something that’s so closed? Why aren’t you publishing this data in RDF?”

I was surprised to the say least. Defeated in fact. I had spent close to three years trying to build this open platform only to have someone more tech-savvy than me explain that what we had built was in fact still a closed platform. It turns out I was at the epicenter of the Linked Open Data community.  Their mission is to link the world’s knowledge in the same way that all of the world’s web pages have been linked to one another.

If you can imagine that today the web is a collection of links between pages, the web of tomorrow (proposed by these folks and Tim Berners-Lee) will be a collection of links between discreet knowledge, or datasets. Anyone will be able to follow the connection that’s been made between one repository of data and another the same way people can now hyperlink between one web page and another.

Linked Open Data essentially refers to building connections between these repositories in a standard format not unlike HTML and hypertext.

What role do API’s, and the people who build them, play in Linked Open Data?

The stewards of databases are no longer just asked to open up their datasets but to make them available in such a way that they link with other data repositories by design. In the case of Social Actions, Ivan from the Wc3 was effectively saying, “It’s great you have all of this data on actions people can take, but what are you doing to link that data with other datasets? What are you doing to help people make the connection between ‘poverty relief’ as an issue, for example, and existing data sets on the prevalence of poverty in a specific location?”

The Social Actions API now cross-references issues and locations with universal identifiers that have been assigned to them. Just like you might cross-reference the subject of a book with a Dewey Decimal number, we are now cross-referencing each action with a universal identifier that helps to link it to related data. Using Zemanta, we are able to provide URIs (Uniform Resource Identifier) from Freebase and DBPedia that make the connection between actions in our system and other material on the web that relates to the same topic.

You can see examples of this at http://search.socialactions.com. Search for any phrase. Below each result you’ll see a link to “Entities.”

Can you tell me more about what ViewChange has done:

ViewChange is an example of an application that queries our actions using Freebase and DBPedia URIs as well as traditional keywords and phrases. The application says to Social Actions, “Show me everything that matches this URI.” The same query is submitted to the Social Actions API as is submitted to any data repository - news articles, videos, blog posts, etc. It’s truly commendable that Link TV, through the ViewChange project, has driven these enhancements on our platform.

A lot is also owed to Doug Puchalski, a programmer with Definition who helped lead the development of ViewChange.

To you, what might the future look like for people who want to take action on the causes they care about?

The technology exists for us to do really amazing things when it comes to matching people with actions they can take to make a difference. The technology itself is advancing, opening up more possibilities for even smarter applications.

The future of social technology, specifically creative implementations of the Social Actions API and similar open source platforms, is very exciting provided nonprofits and foundations continue to make rich data available and link it up with other repositories in the way I’ve attempted to described. The future is also very bright if we continue to experiment with how these linked data repositories can be deployed for forms of community engagement that we would not have thought possible a few years ago.

If everything goes incredibly well in the coming years, what might emerge is ubiquitous infrastructure of enabling technology and complementary applications that continuously present individuals with meaningful and relevant opportunities to enact change.


The Social Actions API – a pioneering open source project since 2008 – continues its boundary-pushing agenda by embracing the semantic web and contributing to the Linked Open Data cloud, encouraging the sector as a whole to leverage open source software and linked data for greater impact.

Visit socialactions.com today to learn more!

December 16 2010


LIVE: Linked data and the semantic web

We’ll have Matt Caines and Nick Petrie from Wannabe Hacks liveblogging for us at news:rewired all day. Follow individual posts on the news:rewired blog for up to date information on all our sessions.

We’ll also have blogging over the course of the day from freelance journalist Rosie Niven.

September 23 2010


“The mass market was a hack”: Data and the future of journalism

The following is an unedited version of an article written for the International Press Institute report ‘Brave News Worlds (PDF)

For the past two centuries journalists have dealt in the currency of information: we transmuted base metals into narrative gold. But information is changing.

At first, the base metals were eye witness accounts, and interviews. Later we learned to melt down official reports, research papers, and balance sheets. And most recently our alloys have been diluted by statements and press releases.

But now journalists are having to get to grips with a new type of information: data. And this is a very rich seam indeed.

Data: what, how and why

Data is a broad term so I should define it here: I am not talking here about statistics or numbers in general, because those are nothing new to journalists. When I talk about data I mean information that can be processed by computers.

This is a crucial distinction: it is one thing for a journalist to look at a balance sheet on paper; it is quite another to be able to dig through those figures on a spreadsheet, or to write a programming script to analyse that data, and match it to other sources of information. We can also more easily analyse new types of data, such as live data, large amounts of text, user behaviour patterns, and network connections.

And that, for me, is hugely important. Indeed, it is potentially transformational. Adding computer processing power to our journalistic arsenal allows us to do more, faster, more accurately, and with others. All of which opens up new opportunities – and new dangers. Things are going to change.

We’ve had over 40 years to see this coming. The growth of the spreadsheet and the database from the 1960s onwards kicked things off by making it much easier for organisations – including governments – to digitise information from what they spent our money on to how many people were being treated for which diseases, and where.

In the 1990s the invention of the world wide web accelerated the data at journalists’ disposal by providing a platform for those spreadsheets and databases to be published and accessed by both humans and computer programs – and a network to distribute it.

And now two cultural movements have combined to add a political dimension to the spread of data: the open data movement, and the linked data movement. Journalists should be familiar with these movements: the arguments that they have developed in holding power to account are a lesson in dealing with entrenched interests, while their experiments with the possibilities of data journalism show the way forward.

While the open data movement campaigns for important information – such as government spending, scientific information and maps – to be made publicly available for the benefit of society both democratically and economically, the linked data movement (championed by the inventor of the web, Sir Tim Berners-Lee) campaigns for that data to be made available in such a way that it can be linked to other sets of data so that, for instance, a computer can see that the director of a company named in a particular government contract is the same person who was paid as a consultant on a related government policy document. Advocates argue that this will also result in economic and social benefits.

Concrete results of both movements can be seen in the US and UK – most visibly with the launch of government data repositories Data.gov and Data.gov.uk in 2009 and 2010 respectively – but also less publicised experiments such as Where Does My Money Go? – which uses data to show how public expenditure is distributed – and Mapumental – which combines travel data, property prices and public ratings of ‘scenicness’ to help you see at a glance which areas of a city might be the best place to live based on your requirements.

But there are dozens if not hundreds of similar examples in industries from health and science to culture and sport. We are experiencing an unprecedented release of data – some have named it ‘Big Data’ – and yet for the most part, media organisations have been slow to react.

That is about to change.

The data journalist

Over the last year an increasing number of news organisations have started to wake from their story-centric production lines and see the value of data. In the UK the MPs’ expenses story was seminal: when a newspaper dictates the news agenda for six weeks, the rest of Fleet Street pays attention – and at the core of this story was a million pieces of data on a disc. Since then every serious news organisation has expanded its data operations.

In the US the journalist-programmer Adrian Holovaty has pioneered the form with the data mashup ChicagoCrime.org and its open source offspring Everyblock, while Aron Pilhofer has innovated at the interactive unit at The New York Times, and new entrants from Talking Points Memo to ProPublica have used data as a launchpad for interrogating the workings of government.

To those involved, it feels like heady days. In reality, it’s very early days indeed. Data journalism takes in a huge range of disciplines, from Computer Assisted Reporting (CAR) and programming, to visualisation and statistics. If you are a journalist with a strength in one of those areas, you are currently exceptional. This cannot last for long: the industry will have to skill up, or it will have nothing left to sell.

Because while news organisations for years made a business out of being a middleman processing content between commerce and consumers, and government and citizens, the internet has made that business model obsolete. It is not enough any more for a journalist to simply be good at writing – or rewriting. There are a million others out there who can write better – large numbers of them working in PR, marketing, or government. While we will always need professional storytellers, many journalists are simply factory line workers.

So on a commercial level if nothing else, publishing will need to establish where the value lies in this new environment – and the new efficiencies to make journalism viable.

Data journalism is one of those areas. With a surfeit of public data being made available, there is a rich supply of raw material. The scarcity lies in the skills to locate and make sense of that – whether the programming skills to scrape it and compare it with other sources in the first place, the design flair to visualise it, or the statistical understanding to unpick it.

“The mass market was a hack”: opportunities for the new economy

The technological opportunity is massive. As processing power continues to grow, the ability to interrogate, combine and present data continues to increase. The development of augmented reality provides a particularly attractive publishing opportunity: imagine being able to see local data-based stories through your mobile phone, or indeed add data to the picture through your own activity. The experiments of the past five years will come to see crude in comparison.

And then there is the commercial opportunity. Publishing is for most publishers, after all, not about selling content but about selling advertising. And here also data has taken on increasing importance. The mass market was a hack. As the saying goes: “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.”

But Google, Facebook and others have used the measurability of the web to reduce the margin of error, and publishers will have to follow suit. It makes sense to put data at the centre of that – while you allow users to drill into the data you have gathered around automotive safety, the offering to advertisers is likely to say “We can display different adverts based on what information the user is interested in”, or “We can point the user to their local dealership based on their location”.

A collaborative future

I’m skeptical of the ability of established publishers to adapt to such a future but, whether they do or not, others will. And the backgrounds of journalists will have to change. The profession has a history of arts graduates who are highly literate but not typically numerate. That has already been the source of ongoing embarrassment for the profession as expert bloggers have highlighted basic errors in the way journalists cover science, health and finance – and it cannot continue.

We will need more journalists who can write a killer Freedom of Information request; more researchers with a knowledge of the hidden corners of the web where databases – the ‘invisible web’ – reside. We will need programmer-journalists who can write a screen scraper to acquire, sort, filter and store that information, and combine or compare it with other sources. We will need designers who can visualise that data in the clearest way possible – not just for editorial reasons but distribution too: infographics are an increasingly significant source of news site traffic.

There is a danger of ‘data churnalism’ – taking public statistics and visualising them in a spectacular way that lacks insight or context. Editors will need the statistical literacy to guard against this, or they will be found out.

And it is not just in editorial that innovation will be needed. Advertising sales will need to experience the same revolution that journalists have experienced, learning the language of web metrics, behavioural advertising and selling the benefits to advertisers.

And as publishers of data too, executives will need to adopt the philosophies of the open data and linked data movements to take advantage of the efficiencies that they provide. The New York Times and The Guardian have both published APIs that allow others to build web services with their content. In return they get access to otherwise unaffordable technical, mathematical and design expertise, and benefit from new products and new audiences, as (in the Guardian’s case) advertising is bundled in with the service. As these benefits become more widely recognised, other publishers will follow.

I have a hope that this will lead to a more collaborative form of journalism. The biggest resource a publisher has is its audience. Until now publishers have simply packaged up that resource for advertisers. But now that the audience is able to access the same information and tools as journalists, to interact with publishers and with each other, they are valuable in different ways.

At the same time the value of the newsroom has diminished: its size has shrunk, its competitive advantage reduced; and no single journalist has the depth and breadth of skillset needed across statistics, CAR, programming and design that data journalism requires. A new medium – and a new market – demands new rules. The more networked and iterative form of journalism that we’ve already seen emerge online is likely to become even more conventional as publishers move from a model that sees the story as the unit of production, to a model that starts with data.

August 04 2010


The Google Maps API server rejected your request. The "client" parameter specified in the request is invalid


I have pointed one of site to my site. and Generated key for that. But it thrown an error "The Google Maps API server rejected your request. The "client" parameter specified in the request is invalid". Please help me.

July 12 2010


How APIs Help the Newsroom

As nice as it is to get praised for the civic-mindedness of your work, the not-so-secret secret about APIs at The Times is that we’re the biggest consumer of them. The flexibility and convenience that the APIs provide make it easier to cut down on repetitive manual work and bring new ideas to fruition. Other news organizations can do the same.

This week, for example, we launched a page to track Republican senators’ positions on the nomination of Elena Kagan to the Supreme Court. The fabulous graphics department has done things like this in the past, such as with the House vote on health care. Both of those graphics were assembled from lots of different pieces of information – electoral results and previous votes among them – and the Kagan data includes stuff like whether the senator in question is running for re-election this year.

You could, of course, ask people to gather up all that information, but if you’re going to do something like this more than once, it makes sense to have a way to automate as much as possible. That’s where the APIs come in. For the Kagan graphic, we used the NYT Congress API to pull in information on senators and their votes, which leaves the gathering of information about their statements on Kagan as the lone manual task. In other words, only the stuff that is specific to this app requires manual effort.

Similarly, the new Districts API we released plays well with our other APIs, so that I was able to build a simple demo app that takes advantage of the fact that our Congress API, among others, can return the current member for a particular district.

For newsrooms, the utility of APIs goes beyond creating Web apps. Making data available via APIs is a little like giving the newsroom the ability to ask and answer questions without having to tie down a CAR person for long periods of time. APIs can provide data in whatever format you choose, which means that a wider range of people can take advantage, from graphic artists used to working with XML to reporters comfortable with CSV files. When your data is more accessible and flexible, the possibilities for doing things with it expands.

So if you have a big local election coming up, having an API for candidate summary data makes it easier to do a quick-and-dirty internal site for reporters and editors to browse, but also gives graphics folks a way to pull in the latest data without having to ask for a spreadsheet. Chances are that if serious data analysis is what you need, that’ll be done in some desktop application or database server anyway. The API is just a messenger, albeit one that is always on and able to spawn lots of ideas and experiments.

If you’re looking to build an API, remember that it’s just a Web application delivering data in a structured format (XML and JSON being two popular formats these days). There are lots of options in terms of what you use to build and serve an API, so it’s important to pay attention to the design: which information you’ll deliver, and how. Being a significant user of your own API is really important, too; it’ll give you the best sense of how well you’ve designed your responses, and what you might be missing.

Tags: API

July 06 2010


Boston NPR affiliate WBUR celebrates its first year of running a news site, experiment with API

Boston’s NPR news station WBUR relaunched its website last July — drastically changing the site from what amounted to a brochure for the station’s radio shows to an active news publication in its own right. The results: Traffic doubled and the site is now being looked at as a model for other NPR stations.

The core of the revamp was aggressively tapping into the resources of NPR. The network’s content API allows WBUR to efficiently pull in NPR’s national and international stories in both text and audio format. Before the API, if a station wanted to provide users with NPR content, links took users away from the station’s site and to NPR’s.

“The secret sauce is we figured out in a very effective way to leverage NPR’s API,” John Davidow, executive editor of WBUR.org told me. The goal was to mimic what WBUR does on the radio, combining its own local content with NPR’s. (WBUR won the Edward R. Murrow Award for “Overall National Excellence” last month.) In an hour of public radio, the first six to eight minutes is the need-to-know news, followed by 52 to 58 minutes of analysis, content and in depth reporting, Davidow said: “What we were wanting do, and the API made it possible, was for us to whiteboard our online news approach with NPR content.” (You can see more about the back end of the redesign in this PowerPoint.)

Has the rest of the NPR family caught on to the secret sauce? I spoke with NPR’s director of application development Daniel Jacobson, who said that about half a dozen medium-to-large member stations have contacted him recently about using the API. A “common theme” on the calls has been a desire to reach out to WBUR for guidance. A number of public radio outlets have recently incorporated the API as well, like KQED and Minnesota Public Radio. Jacobson says NPR hasn’t tracked how many stations are using the API, but they know about 1 billion stories are being delivered through it every month. Those stories are consumed across platforms, from NPR’s own site to mobile applications and member station sites.

I asked Jacobson whether there was any concern that the API, by spreading NPR’s content around, could ultimately cause a drop in traffic on NPR’s own site. “All we’ve seen on our site is growth,” he said. “If WBUR is cannibalizing our traffic, we haven’t been able to detect it.” And even if it were, he says, that might not be a problem: NPR’s goal is to support the member stations.

The API is the centerpiece of NPR’s digital strategy. It’s what allows NPR to expand its mobile capacity, and it will play a part in the much anticipated Project Argo later this year. Separately, another API program aims to unite public radio and public television content into a common platform.

Beyond the API, WBUR’s relaunch also required changes in workflows and staff responsibilities. Radio reporters now create web versions of their on-air work, and they’re responsible for gathering media (like photos and video) that had no role in a pre-web radio world. News doesn’t have to be broken first on the radio: “We put up the news [on the site] as fast as we can get the news,” Davidow said. “We’re so used to the old days, which is, something went on the radio, it went out to Venus and that was the end of it. It was very hard to archive, to find. Now, the hard work that our newsroom does, it’s there now. There’s a perpetual use to it, there’s a shelf life.”

WBUR’s rebirth online comes at an interesting time for news in Boston. On the radio dial, WGBH — Boston’s other NPR affiliate — switched away from its classical music format to compete directly with WBUR and is building collaborations with its popular PBS affiliate. The Boston Globe is not far removed from its own near-death experience, and rumors keep swirling about paywalls at both of Boston’s daily newspapers. If Boston.com were to become anything other than free, there’d be a free, high-quality alternative at WBUR. “We have no intention of charging for our content,” Davidow told me. However, he emphasized that WBUR is interested in collaboration and community with other news organizations: “It’s a long way of saying we’re not looking to compete with The Boston Globe.”

Photo by Theresa Thompson used under a creative commons license

May 22 2010


How can a news and content platform build a great API

What are some of the characteristics that would make a great news platform API?

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
Get rid of the ads (sfw)

Don't be the product, buy the product!