Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 16 2011

19:00

Dataviz, democratized: Google opens Public Data Explorer

Two years ago, Google acquired Gapminder, the Swedish graphics-display company whose Trendalyzer software specializes in representing data over time. (You may recall the company from this awesome and much-circulated TED talk from 2006.) Since the acquisition, Google has built out the Trendalyzer software to create its Public Data Explorer, a tool that makes large datasets easy to visualize — and, for consumers, to play with. The Explorer has created interactive and dynamic data visualizations of information about traditionally hard-to-grasp concepts like unemployment figures, income statistics, world development indicators, and more. It’s a future-of-context dream.

“It’s about not just looking at data, but really understanding and exploring it visually,” Benjamin Yolken, Google Public Data’s product manager, told me. The project’s overall mission, it’s worth noting, is a kind of macro-meets-meta version of journalism’s: “to make the world’s public data sets accessible and useful.”

The big catch, though, as far as journalism goes, has been that users haven’t been able to do much with the tool besides look at it. If you’ve gathered public data sets that would lend themselves to visualization on the Explorer, you’ve had to contact Google and ask them to visualize it for you. (“While we won’t be able to individually reply to everyone who fills out this form,” a contact form noted, “we may be in touch to learn more about your data.”)

Today, though, that’s changing: Google is opening up its Explorer tool. Yolken and Omar Benjelloun, Google Public Data’s tech lead, have written a new data format, the Dataset Publishing Language (DSPL), designed particularly to support dynamic dataviz. “DSPL is an XML-based format designed from the ground up to support rich, interactive visualizations like those in the Public Data Explorer,” Benjelloun notes in a blog post announcing the opening. (It’s the same language that the Public Data team had been using internally to produce its datasets and visualizations.) Today, that language — and an interface facilitating data upload — are available for anyone to use, putting the “public” in “public data.”

It’s an experimental feature that, like the Public Data Explorer itself — not to mention some of Google’s most fun features (Google Scribe, Google Body, Google Books’ Ngrams viewer, etc.) — lives under the Google Labs umbrella. And, importantly, it’s a feature, Yolken notes, that “allows users who may or may not have technical expertise to explore, visually, a number of public data sets.”

The newly open tool could be particularly useful for news organizations that would like to get into the dataviz game, but that don’t have the resources — of time, of talent, of money — to invest in proprietary systems. (The papers of the Journal Register Company, a news organization that has made a point of experimenting with free, web-based journalistic tools, comes to mind here — though any news outfit, big or small, could benefit.) The Public Data team had two main goals in opening up the Explorer tool to users, Yolken notes: Increasing the datasets available to be visualized and, then, distributing them. “First, we want to have lots of data sets available that are credible and useful and interesting,” he says. Second, the hope is that the tool’s embedding capabilities will allow for easy sharing of those data sets.

Though the Explorer platform is now open to anyone — and though Yolken and Benjelloun mention teachers and students as groups who might do some interesting experiments with it — they hope that journalists, in particular, will make use of the tool. Even more particularly: “data-driven journalists.”

To that end, the tool isn’t as intuitively understandable as, say, the awesomely easy Ngrams book viewer tool — “we realized that, in order to show the data properly, to make the data understandable, you really needed to describe the metadata,” Benjelloun notes — but nor does it require special expertise to use. “This format doesn’t require engineering skills,” Yolken says; then again, “it’s not as easy as a spreadsheet.” It’s somewhere in the middle — akin to learning, say, basic HTML. (Here’s more on how to use it.)

But if journos can get beyond the initial learning curve (one that, for data-driven journos, in particular, won’t be especially steep), they, and their readers, could benefit doubly. The Explorer tool allows users not just to create dynamic data visualizations, but also to avail themselves of a new way to understand those data in the first place. In other words: The tool could prove useful from both the presentation and the production ends of the journalistic spectrum. There’s something about watching data move over time, Yolken notes, that changes your perspective as a consumer of those data. “It makes you start asking questions that you wouldn’t have asked before.”

May 04 2010

08:36

Data journalism pt5: Mashing data (comments wanted)

This is a draft from a book chapter on data journalism (part 1 looks at finding data; part 2 at interrogating datapart 3 at visualisation, and 4 at visualisation tools). I’d really appreciate any additions or comments you can make – particularly around tips and tools.

Mashing data

Wikipedia defines a mashup particularly succinctly, as “a web page or application that uses or combines data or functionality from two or many more external sources to create a new service.” Those sources may be online spreadsheets or tables; maps; RSS feeds (which could be anything from Twitter tweets, blog posts or news articles to images, video, audio or search results); or anything else which is structured enough to ‘match’ against another source.

This ‘match’ is typically what makes a mashup. It might be matching a city mentioned in a news article against the same city in a map; or it may be matching the name of an author with that same name in the tags of a photo; or matching the search results for ‘earthquake’ from a number of different sources. The results can be useful to you as a journalist, to the user, or both.

Why make a mashup?

Mashups can be particularly useful in providing live coverage of a particular event or ongoing issue – mashing images from a protest march, for example, against a map. Creating a mashup online is not too dissimilar from how, in broadcast journalism, you might set up cameras at key points around a physical location in anticipation of an event from which you will later ‘pull’ live feeds: in a mashup you are effectively doing exactly the same thing – only in a virtual space rather than a physical one. So, instead of setting up a feed at the corner of an important junction, you might decide to pull a feed from Flickr of any images that are tagged with the words ‘protest’ and ‘anti-fascist’.

Some web developers have built entire sites that are mashups. Twazzup (twazzup.com) for example, will show you a mix of Twitter tweets, images from Flickr, news updates and websites – all based on the search term you enter. And Friendfeed (friendfeed.com) pulls in data that you and your social circle post to a range of social networking sites, and displays them in one place.

Mashups also provide a different way for users to interact with content – either by choosing how to navigate (for instance by using a map), or by inviting them to input something (for instance, a search term, or selecting a point on a slider). The Super Tuesday YouTube/Google Maps mashup, for instance, provided an at-a-glance overview of what election-related videos were being uploaded where across the US.

Finally, mashups offer an opportunity for juxtaposing different datasets to provide fresh, sometimes ongoing, insights. The MySociety/Channel 4 project Mapumental, for example, combines house price data with travel information and data on the ’scenicness’ of different locations to provide an interactive map of a location which the user can interrogate based on their individual preferences.

Mashup tools

Like so many aspects of online journalism, the ease with which you can create a mashup has increased significantly in recent years. An increase in the number and power of online tools, combined with the increasing ‘mashability’ of websites and data, mean that journalists can now create a basic mashup through the simple procedures of drag-and-drop or copy-and-paste.

A simple RSS mashup, which combines the feeds from a number of different sources into one, for example, can now be created using an RSS aggregator such as xFruits (xfruits.com) or Jumbra (jumbra.com).

Likewise, you can mix two maps together using the website MapTube (maptube.org) which also contains a number of maps for you to play with.

And if you want to mix two sources of data into one visualisation the site DataMasher (datamasher.org) will let you do that – although you’ll have to make do with the US data that the site provides. Google Public Data Explorer (google.com/publicdata) is a similar tool which allows you to play with global data.

But perhaps the most useful tool for news mashups is Yahoo! Pipes (pipes.yahoo.com).

Yahoo! Pipes allows you to choose a source of data – it might be an RSS feed, an online spreadsheet or something that the user will input – and do a variety of things with it. Here are just some of the basic things you might do:

  • Add it to other sources
  • Combine it with other sources – for instance, matching images to text
  • Filter it
  • Count it
  • Annotate it
  • Translate it
  • Create a gallery from the results
  • Place results on a map

You could write a whole book on how to use Yahoo! Pipes – indeed, people have – so we will not cover the practicalities of using all of those features here. There are also dozens of websites and help files devoted to the site (which you should explore). Below, however, is a short tutorial to introduce you to the website and how it works – this is a good way to understand how basic mashups work, and how easily they can be created.

Mashups and APIs

Although there are a number of easy-to-use mashup creators listed above, really impressive mashups tend to be written by people with knowledge of programming languages, and use APIs. APIs (Application Programming Interface) allow websites to interact with other websites. The launch of the Google Maps API in 2005, for example, has been described as a ‘huge tipping point’ in mashup history (Duvander, 2008) as it allowed web developers to ‘mash’ countless other sources of data with maps. Since then it has become commonplace for new websites, particularly in the social media arena, to launch their own APIs in order to allow web developers to do interesting things with their feeds and data – not just mashups, but applications and services too.

If you want to develop a particularly ambitious mashup it is likely that you will need to teach yourself some programming skills, and familiarise yourself with some APIs (the APIs of Twitter, Google Maps and Flickr are good places to start).

Box-out: Anatomy of a feed

The image below from ReadWriteWeb shows the code behind a simple Twitter update. It includes information about the author, their location, whether the update was a reply to someone else, what time and where it was created, and lots more besides. Each of these values can be used by a mashup in various ways – for example, you might match the author of this tweet with the author of a blog or image; you might match its time against other things being published at that moment; or you might use their location to plot this update on a map.

While the code can be intimidating, you do not need to understand programming in order to be able to do things with it. Of course, it will help if you do…

Anatomy of a Twitter feed

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl