Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

August 02 2012

21:47

Announcing TimesOpen 2012

It's that time of the year again! We've just released our schedule for TimesOpen 2012. As always, we'll have four events leading up to an all-day hack day in December.

January 09 2012

20:10

NYT Districts API helps Fractured Atlas help artists

The arts, and the benefits to the public they provide, sometimes gets lost, barely noticed by government. Fractured Atlas, a New York City-based multi-disciplinary arts service organization, is finding that by creating information and data services for its members and for city arts communities, it can also provide more effective advocacy for the arts to government. Fractured Atlas uses the New York Times' District API to create and run these services and to link them to its arts advocacy mission.

December 21 2011

16:05

New in the Campaign Finance API: Independent Expenditures

The Campaign Finance API now supports independent expenditure data.
Tags: APIs Open

December 07 2011

01:41

Recapping TimesOpen: Hack Day

Seventy developers, maybe more, visited the Times Building on Saturday for TimesOpen: Hack Day. They were joined a couple dozen developers from The New York Times. The combined crowd occupied every available seat in the conference facility on the 15th floor of the Times Building. Practically everybody was there to program, and they all brought their coding chops and their creativity. In the end 15 projects demoed, 6 projects won prizes. And one project, HappyStance, earned the title, Best of Hack Day.

October 07 2011

17:46

TimesOpen full-length videos

Full-length videos from the first two TimesOpen events, HTML5 and Beyond, and Innovating Developer Culture, are now available.

September 15 2011

17:29

Recapping TimesOpen: Innovating Developer Culture

The second TimesOpen of 2011 on Innovating Developer Culture took place last Wednesday. The program was a departure from the usual code-heavy fare the meetings are known for. An expert on organization culture change, Jessica Lawrence, from the New York Tech Meetup, joined engineering manager, Ken Little from Etsy, and Foursquare co-founder, Naveen Selvadurai.

July 28 2011

16:24

The post-post-CMS CMS: Loosely coupled monoliths & read-only APIs

Photo of an 'old school' newspaper layout by 'limonada' on Flickr
Creative commons photo by limonada on Flickr

As I sat down with my colleagues on Tuesday for a little hack day on our favourite open-source content management system, we had a familiar conversation — one that is probably familiar to all people who hack on CMSs — which is, What is the future of content management?

It’s a conversation that has been unfolding and evolving for many, many years, but seems to be a gaining a lot of steam in 2011.

The essence of some of the more recent conversation is nicely summarized over on Stijn’s blog.

One of the questions presented is: will tomorrow’s CMS be a monolithic app, or a ‘confederation’ of purpose-built micro-applications — for example, like the app that Talking Points Memo demonstrates for managing their front page.

In the blog post on the TPM media lab about the ‘Twilight of the CMS,’ they describe their solution to the problem of the monolithic CMS — “a simple and flexible API that digests manifold requests from the different applications.”

As I ponder these ideas in the context of my work for online-only publishers like TheTyee.ca, I struggle with a few competing notions…

Read-only API, or Read/Write API

In the case of TheTyee.ca, the monolithic CMS writes data to a data store (Elastic Search) that directly provides the (soon to actually be public) public API. From there, various applications can request the data from the API and receive a structured JSON representation of that data back as a response. Basically, once these clients have the data, they can do what they want.

This is great. It’s a read-only scenario. The CMS is still the ‘authority’ on the state and structure of the data (along with a lot of other information), but there is an identical copy of that data in the store that provides the API.

Now, let’s think about that TPM front page management app: it clearly needs read/write access to their API, because it can change not just layout, but editorial content like the story title, deck, teaser, and so on.

So, if the API is read/write, the questions I have are:

  • The schema for the documents (the stories, etc.) must be validated somewhere, right? So… does that logic live in each purpose-built app, or as a layer on top of the data store? And does that then violate a bit of the ‘Don’t Repeat Yourself’ design pattern?

  • Do these content-centric writes to the API make their way back to the CMS or editorial workflow system? And, if they don’t, does that not introduce some confusion about mis-matched titles, decks, teasers, and so on? For example, say I change title of a story on the front page, but now I see a typo in the body of story and want to fix that, so I go into the other CMS and search for … whoops! … what was the title of that story again?

  • How does this new ‘front page app’ or the read/write API handle typically CMS-y things like competing or conflicting write requests? Or version control? Or audit trails of who made which edits? If one, or the other, or both, actually handle these concerns, is this not a duplication of logic that’s already in the CMS?

Perhaps I’m not thinking about this right, but my gut is saying that the idea of a read/write API — a scenario where you have both a CMS (Movable Type in TPM’s case) and a ‘front page management’ app — starts to get a bit tricky when you think about all the roles that the CMS plays in the day-to-day of a site like TPM.

It gets even more tricky when you think about all the delivery mediums that have their own ‘front page’ — tablet experiences, scaled down mobile experiences, feeds, e-newsletters, and so on.

Presentation management or editorial management

The other thing that is immediately striking about the TPM demo is the bizarrely print-centric feel to the experience — I’m immediately transported back to my (very brief) days working at The Varsity where the editors and designers would literally paste up the newspaper’s pages on big boards.

For a publication like the TPM — an entirely online ‘paper’ — it seems like an odd, slightly ‘retro,’ approach in an age that is defined by content that defies containers. One must ask: where does it end? Should there be a purpose-built app for each section’s front page, e.g., Sports, Arts, Life, etc.? For each regional section? For each-and-every article?

Isn’t this just vanity at some level? Endless bit-twiddling to make things look ‘just right’? Kinda’ like those mornings when I just can’t decide whether to wear a black shirt or a white shirt and stand in front of the mirror trying them on for what seems like eternity?

So, coming back to my point: in a time when many believe (like a religion!) that content and presentation should be separated — not as an exercise, but because that content is delivered to literally hundreds of different end-user experiences (phones, tablets, readers, etc.) — do we really want to be building tools that focus on the presentation for just one of those experiences? If so, where does it end?

For the most part, the modern-day CMS has been designed to alleviate these myriad challenges by providing a way for non-technical people to input structured data, and the tools for developers to output that structured data in a variety of ways, formats, and mediums.

Watching the TPM video gives me some ideas about how to improve the experience for an editor to quickly edit headlines, decks, teasers, photos of the morning’s stories — and even to indicate their relative priority in terms of newsworthiness — but I would want to stop there, at the editorial, and let the presentation layer be handled according to the medium, device, or experience the content is being delivered to.

Loosely coupled monoliths, read-only APIs

Many moons ago, I proposed that Wienberger’s Small Pieces Loosely Joined idea held true for content management also. The proposal was simple: instead of investing in one monolithic CMS — a CMS that did everything from manage content to advertising delivery to comments to search to who-knows-what (a trend in CMS projects at the time) — an organization could choose the current ‘best in class’ solution for each need and connect them together through loose coupling. Then, if a better solution came out for, say, comments, the old system could be replaced with the newer system without having to re-build the whole enchilada.

(Of course, the flip side often is that louse coupling can feel like bubble gum and string when you have to work with it every day.)

So, while my own experience is that loose coupling is great, and that purpose-specific applications are usually better than apps that try to do everything, I would personally want to draw the line somewhere. For me, that line is between distinct ‘areas of responsibility,’ like editorial, advertising, design, community, search, and so on.

In this scenario, each area would have the authority over its own data, and the logic for how that data is structured and validated, and so on. If that data was written to a central data store that provided an API — something simple, flexible, and RESTful — the other apps in a ‘confederation’ could read from it, choose what to do with it, how to present it, and so on, but the final ‘say’ on that data would be from the app that is responsible for creating it.

For me, this is a sensible way to allow these apps to work in concert without having the logic about the data living in multiple places, i.e., the API, and the clients that can write to it (which makes sense if you’re Twitter with hundreds of external clients, but not if you’re one organization building exclusively internal client apps).

Photo of an 'old school' newspaper layout by 'limonada' on Flickr
Creative commons photo by limonada on Flickr

As I sat down with my colleagues on Tuesday for a little hack day on our favourite open-source content management system, we had a familiar conversation — one that is probably familiar to all people who hack on CMSs — which is, What is the future of content management?

It’s a conversation that has been unfolding and evolving for many, many years, but seems to be a gaining a lot of steam again in 2011.

The essence of some of the more recent conversation is nicely summarized over on Stijn’s blog.

One of the questions presented is: will tomorrow’s CMS be a monolithic app, or a ‘confederation’ of purpose-built micro-applications — like the app that Talking Points Memo demonstrates for managing their front page, for example.

In the blog post on the TPM media lab about the ‘Twilight of the CMS,’ they describe their solution to the problem of the monolithic CMS — “a simple and flexible API that digests manifold requests from the different applications.”

As I ponder these ideas in the context of my work for online-only publishers like TheTyee.ca, I struggle with a few competing notions…

Read-only API, or Read/Write API

In the case of TheTyee.ca, the monolithic CMS writes data to a data store (Elastic Search) that directly provides the (soon to actually be public) public API. From there, various applications can request the data from the API and receive a structured JSON representation of that data back as a response. Basically, once these clients have the data, they can do what they want.

This is great. It’s a read-only scenario. The CMS is still the ‘authority’ on the state and structure of the data (along with a lot of other information), but there is an identical copy of that data in the store that provides the API.

Now, let’s think about that TPM front page management app: it clearly needs read/write access to their API, because it can change not just layout, but editorial content like the story title, deck, teaser, and so on.

So, if the API is read/write, the questions I have are:

  • The schema for the documents (the stories, etc.) must be validated somewhere, right? So… does that logic live in each purpose-built app, or as a layer on top of the data store? And does that then violate a bit of the ‘Don’t Repeat Yourself’ design pattern?

  • Do these content-centric writes to the API make their way back to the CMS or editorial workflow system? And, if they don’t, does that not introduce some confusion about mis-matched titles, decks, teasers, and so on? For example, say I change title of a story on the front page, but now I see a typo in the body of story and want to fix that, so I go into the other CMS and search for … whoops! … what was the title of that story again?

  • How does this new ‘front page app,’ or the read/write API, handle typically CMS-y things like competing or conflicting write requests? Or version control? Or audit trails of who made which edits? If one, or the other, or both, actually handle these concerns, is this not a duplication of logic that’s already in the CMS?

Perhaps I’m not thinking about this right, but my gut is saying that the idea of a read/write API — a scenario where you have both a CMS (Movable Type in TPM’s case) and a ‘front page management’ app — starts to get a bit tricky when you think about all the roles that the CMS plays in the day-to-day of a site like TPM.

It gets even more tricky when you think about all the delivery mediums that have their own ‘front page’ — tablet experiences, scaled down mobile experiences, feeds, e-newsletters, and so on.

Presentation management or editorial management

The other thing that is immediately striking about the TPM demo is the bizarrely print-centric feel to the experience — I’m immediately transported back to my (very brief) days working at The Varsity where the editors and designers would literally paste up the newspaper’s pages on big boards.

For a publication like the TPM — an entirely online ‘paper’ — it seems like an odd, slightly ‘retro,’ approach in an age that is defined by content that defies containers. One must ask: where does it end? Should there be a purpose-built app for each section’s front page, e.g., Sports, Arts, Life, etc.? For each regional section? For each-and-every article?

Isn’t this just vanity at some level? Endless bit-twiddling to make things look ‘just right’? Kinda’ like those mornings when I just can’t decide whether to wear a black shirt or a white shirt and stand in front of the mirror trying them on for what seems like eternity.

So, coming back to my point: in a time when many believe (like a religion!) that content and presentation should be separated — not as an exercise, but because that content is delivered to literally hundreds of different end-user experiences (phones, tablets, readers, etc.) — do we really want to be building tools that focus on the presentation for just one of those experiences? If so, where does it end?

For the most part, the modern-day CMS has been designed to alleviate these myriad challenges by providing a way for non-technical people to input structured data, and the tools for developers to output that structured data in a variety of ways, formats, and mediums.

Watching the TPM video gives me some ideas about how to improve the experience for an editor to quickly edit headlines, decks, teasers, photos of the morning’s stories — and even to indicate their relative priority in terms of newsworthiness — but I would want to stop there, at the editorial, and let the presentation layer be handled according to the medium, device, or experience the content is being delivered to.

Loosely coupled monoliths & read-only APIs

Many moons ago, I proposed that Wienberger’s Small Pieces Loosely Joined idea held true for content management also. The proposal was simple: instead of investing in one monolithic CMS — a CMS that did everything from manage content to advertising delivery to comments to search to who-knows-what (a trend in CMS projects at the time) — an organization could choose the current ‘best in class’ solution for each need and connect them together through loose coupling. Then, if a better solution came out for, say, comments, the old system could be replaced with the newer system without having to re-build the whole enchilada.

(Of course, the flip side often is that louse coupling can feel like bubble gum and string when you have to work with it every day.)

So, while my own experience is that loose coupling is great, and that purpose-specific applications are usually better than apps that try to do everything, I would personally want to draw the line somewhere. For me, that line is between distinct ‘areas of responsibility,’ like editorial, advertising, design, community, search, and so on.

In this scenario, each area would have the authority over its own data, and the logic for how that data is structured and validated, and so on. If that data was written to a central data store that provided an API — something simple, flexible, and RESTful — the other apps in a ‘confederation’ could read from it, choose what to do with it, how to present it, and so on, but the final ‘say’ on that data would be from the app that is responsible for creating it.

For me, this is a sensible way to allow these apps to work in concert without having the logic about the data living in multiple places, i.e., the API, and the clients that can write to it (which makes sense if you’re Twitter with hundreds of external clients, but not if you’re one organization building exclusively internal client apps).

Would love to hear otherwise, or experiences of how others are handling this or thinking about the challenge.

June 02 2011

18:53

New in the Campaign Finance API: Paper Filings

Today we're announcing the addition of paper campaign filings to our Campaign Finance API, which previously had only provided details of electronically filed reports.

May 24 2011

20:36

NYTWrites: Exploring Topics and Bylines

Irene Ros, a research developer at the IBM Visual Communication Lab, has created a " sketch project" that uses the Article Search API to explore the topics covered by Times reporters.

May 03 2011

14:30

PBS plays Google’s word game, transcribing thousands of hours of video into crawler-friendly text

PBS' new video search engine

Blogs and newspaper sites enjoy a built-in advantage when it comes to search-engine optimization. They deal in words. But a whole universe of audio and video content is practically invisible to Google.

Say I want to do research on Osama bin Laden. A web search would return news articles about his assassination, a flurry of tweets, the Wikipedia pageMichael Scheuer’s biography, and an old Frontline documentary, “Hunting Bin Laden.” I might then take my search to Lexis Nexis and academic journals. But I would never find, for example, Frontline’s recent reporting on the Egyptian revolution, where bin Laden makes an appearance, or any number of other video stories in which the name is mentioned.

While video and audio transcripts are rich for Google mining, they’re also time-consuming and expensive. PBS is out to fix that by building a better search engine. The network has transcribed and tagged, automatically, more than 2,000 hours of video using software called MediaCloud.

“Video is now more Google-friendly,” said Jon Brendsel, the network’s vice president of product development. Normally, automatic transcription is laughably bad — Google Voice users know this — but Brendsel is satisfied with the results of PBS’ transcription efforts. He said the accuracy rate is about 80 to 90 percent. That’s “much better than the quality that I normally attribute to closed captioning,” he said. The software can get away with mistakes because the transcripts are being read by computers, not people. (For a hefty fee, the content-optimization platform RAMP will put its humans to work to review and refine the auto-generated transcripts.)

Query “Osama bin Laden” at PBS’ video portal, and the new search engine returns videos in which the phrase appears, including time codes. ”Osama bin Laden found at 33:32,” reads one result. (So that’s where he was?) Mouse over the text to see the keyword in context; click it to be deposited at the precise moment the keyword is spoken. (Notice the text “Osama bin Laden” appears nowhere on the resulting page.)

PBS’ radio cousin, NPR, still relies on humans for transcription, paying a third-party service to capture 51 hours of audio a week. In-house editors do a final sweep to ensure accuracy of proper names and unusual words. It’s expensive, though NPR does not disclose how much, and time-consuming, with a turnaround time of four to six hours.

“We continue to keep an eye on automated solutions, which have gradually improved over time, but are not of sufficiently high quality yet to be suitable for licensing and other public distribution,” said Kinsey Wilson, NPR’s head of digital media.

Despite the expense, NPR decided to make all transcripts available for free when relaunching its website in July 2009. ”Transcripts were once largely the province of librarians and other specialists whose job was to find archival content, often for professional purposes,” Wilson said at the time. “As Web content becomes easier to share and distribute, and search and social media have become important drivers of audience engagement, archival content — whether in the form of stories or transcripts — has an entirely different value than it did in the past.”

Put another way: Readers today (kids today!) are accustomed to search as a shortcut to obtaining information. If Google doesn’t index your content, it might as well not exist. (And there are other emerging platforms in the layering-text-on-video game — Universal Subtitles, for example, which essentially crowdsources captioning efforts.) Brendsel said mass indexing is a much more complicated project for PBS, because PBS does not own its content, unlike NPR. The network has to work out rights with multiple producers. And the transcription software is also expensive, he said. PBS is still working out a financial model for extending this service to local stations.

Brendsel plans to offer human-readable transcripts on story pages soon, when the video portal gets a design refresh. That will be the final step in making PBS video truly Google-friendly, allowing search engines to to crawl its text.

April 20 2011

20:45

Best Sellers: A Perspective on E-Books

Writer and coder Robin Sloan is using the Times Best Sellers API to provide a new perspective on the e-book market.

April 01 2011

21:17

More Best Sellers Data

The Best Sellers API now extends back to June 8, 2008.
19:22

Research shows benefits of open innovation for news

The first research paper at ISOJ from Tanja Aitamurto, University of Tampere, Finland, and Seth Lewis, University of Minnesota, looked at processes of innovation (PDF).

Presenting the paper, Lewis highlighted the challenge facing news organisations today: keeping up with modern demands for R&D while finding new sources of revenue.

He said media organisations have under invested in R&D and not expressed much interest in open innovation.

The paper looked at NPR, the New York Times, The Guardian and USA Today. Lewis highlighted how the Guardian lets developers access APIs to do new things with their content.

In exchange, advertising appears on these new products and services, based on Guardian content.

The biggest benefit Aitamurto and Lewis found was the speeding up of internal and external product development.

Essentially, it allows other to experiment with content, and access groups of users that would be hard to reach before.

Secondly, open innovation offered opportunities for new revenue streams. And this fed into the third benefit – the ability to leverage the brand and drive traffic. The impact, said Lewis, was to have the brand seen as a platform, rather than a product.

The fourth benefit cited by Lewis was the potential to build a community of developers. For example, the Times described it as good street cred.

The challenges were more cultural than technological, said Lewis.  Corporate leaders didn’t like the word ‘open’ and were more receptive to language such as ‘business development Web 2.0″.

Lewis concluded by saying that open innovation meant news content gained a new life and take news organisations and weave them into the structure of the web.

February 11 2011

21:22

Best Sellers API: Now With E-Books

We've added e-book data to the Best Sellers API.
Tags: APIs

February 10 2011

19:11

Updates to the Campaign Finance API

Political campaigns don't have an off-season, but the brief lull between last November's general election and now has given us time to make some updates to our Campaign Finance API.

January 05 2011

20:01

Updates to the New York State Legislature API

A new year brings with it a new government in Albany, including Gov. Andrew Cuomo and a new cast of state lawmakers. To prepare, we've updated our New York State Legislature API.
Tags: APIs politics

December 23 2010

21:40

Congress API Update: Nominee Details Responses

We've made a minor fix to the Nominee Details Responses in the Times Congress API.

December 21 2010

18:00

Jennifer 8. Lee on raw data, APIs, and the growth of “Little Brother”

Editor’s Note: We’re wrapping up 2010 by asking some of the smartest people in journalism what the new year will bring.

Here, Jennifer 8. Lee gives us predictions, about the growing role of raw data, the importance of APIs, and the need for a break-out civic mobile app.

Raw data and the rise of “Little Brother”

In 2011 there will be a slew of riffs on the WikiLeaks anonymous dropbox scheme, sans gender drama — at least one of them by former WikiLeakers themselves. It will remain to be seen how protective the technologies are.

Basically, this codifies the rise of primary source materials — documents, video, photos — as cohesive units of consumable journalism. Turns out, despite the great push for citizen journalism, citizens are not, on average, great at “journalism.” But they are excellent conduits for raw material — those documents, videos, or photos. They record events digitally as an eyewitness, obtain documents through Freedom of Information requests, or have access to files through the work they do. We are seeing an important element of accountability journalism emerge.

Big Brother has long been raised as a threat of technological advancement (and certainly the National Security Agency has done its fair share of snooping). But in reality, it is the encroachment of Little Brother that average Americans are more likely to feel in our day-to-day lives — that people around us carry digital devices that can be pulled out for photo or videos, or they can easily copy digital files (compared to the months of covert photocopying that Ellsberg did for 7,000 pages) that others would rather not have shared with the world.

One notable strength of raw material is that it has a natural viral lift for two reasons: audience engagement, and the way legacy media operates with regard to sourcing and competition. Social media is a three-legged stool: create, consume, and share content. Because original material often feels more like an original discovery, it is more appealing to share. Documents, videos, and photos are there for anyone to examine and experience firsthand. The audience can interpret, debate, comment as they choose, and they feel greater freedom to reupload and remix that material, especially video.

The importance of APIs

There will also be an explosion in shift from raw data to information made available by application programming interfaces. A good example is ScraperWiki, out of the United Kingdom, which scrapes government data into repositories and then makes it available in an API.

Government agencies are hearing the public cry for data, and they are making raw data available. Sometimes it’s in friendlier formats like .csv or .xls. Sometimes it is in less usable formats, like PDF (as the House of Representatives did with a 3,000-page PDF of expenses) and even .exe files. (As the Coast Guard’s National Response Center has done with its incident data. It’s an extractable .xls with a readme. I know. It makes a lot of people cringe. At least their site isn’t also in Flash.) As part of this open push, the Obama administration has set up data.gov.

As that comes out, people are realizing that it’s not enough to get the public to bite, even though the underlying data might contain interesting material. It needs to be even easier to access. A good example of what happens when something becomes easily searchable: ProPublica’s Dollars for Docs project, on payments doctors received from pharmaceutical companies, generated an explosion of interest/investigations by taking data that was already technically public and standardizing it to make it searchable on the Internet.

What we need: the great civic mobile app

What we’re still waiting for: The break-out civic mobile app, a combination of Craigslist and Foursquare, where a critical mass of people can “check in” with comments, photos and complaints about their local community. It’s unclear how this will happen. Perhaps it will be built on the geolocation tools offered by Facebook or Twitter. Perhaps it will be an extension of Craigslist, which already has a brand associated with local community. Perhaps it’ll be something like SeeClickFix, which allows people to register complaint about potholes or graffiti, or CitySeed, a mobile app the Knight Foundation has given a grant to develop.

[Disclosure: Both the Knight Foundation and Lee are financial supporters of the Lab.]

December 10 2010

21:03

More TimesOpen Hacks

Earlier this week, we promised you more details about the hacks submitted at our TimesOpen Hack Day. Here they are to help kick off your weekend.

November 29 2010

22:23

Campaign Finance API Updates

We've made a few tweaks and additions to the Campaign Finance API.
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl