- monthly subscription or
- one time payment
- cancelable any time
"Tell the chef, the beer is on me."
Por iniciativa de algunas instituciones miembros de NetSquared ( http://www.netsquared.org) y con el apoyo de TechSoup Global (http://www.techsoupglobal.org/) estamos armando una red de organizaciones e individuos que tiene por objetivo el uso de la tecnologia para bien social ("technology for social good") en la región de Latino América llamada RedlaTic
The Netsquared Regional Conference for Cameroon and Nigeria is a multi stake holder event that will bring together actors from local Netsquared groups, Internet Society, civil society, diplomatic institutions, government and the tech world to articulate on issues related to the social web and nongovernmental diplomacy. Citizens from three neighboring countries including: Cameroon, Nigeria and Central African Republic, in a two day event will seek to resolve the following challenges:
- The difficulties faced in introducing the social web for social development in the sub region
Hundreds of delegates from government, civil society, and business gathered in Brasilia recently for the first Open Government Partnership meetings since the inception of this initiative. Transparency, accountability, and open data as fundamental building blocks of a new, open form of government were the main issues debated. With the advent of these meetings, we took the opportunity to expand an open data set by adding street names to OpenStreetMap.
Getting ready to survey the Cruzeiro neighborhood in Brasilia.
OpenStreetMap, sometimes dubbed the "Wikipedia of maps," is an open geospatial database. Anyone can go to openstreetmap.org, create an account, and add to the world map. The accessibility of this form of contribution, paired with the openness of its common data repository, holds a powerful promise of commoditized geographic data.
As this data repository evolves, along with corresponding tools, many more people gain access to geospatial analysis and publishing -- which previously was limited to a select few.
When Steve Coast founded OpenStreetMap in 2004, the proposition to go out and crowdsource a map of the world must have sounded ludicrous to most. After pivotal growth in 2008 and the widely publicized rallying around mapping Haiti in 2010, the OpenStreetMap community has proven how incredibly powerful a free-floating network of contributors can be. There are more than 500,000 OpenStreetMap contributors today. About 3 percent (that's still a whopping 15,000 people) contribute a majority of the data, with roughly 1,300 contributors joining each week. Around the time when Foursquare switched to OpenStreetMap and Apple began using OpenStreetMap data in iPhoto, new contributors jumped to about 2,300 per month.
As the OpenGovernment Partnership meetings took place, we wanted to show people how easy it is to contribute to OpenStreetMap. So two days before the meetings kicked off, we invited attendees to join us for a mapping party, where we walked and drove around neighborhoods surveying street names and points of interest. This is just one technique for contributing to OpenStreetMap, one that is quite simple and fun.
Here's a rundown of the most common ways people add data to OpenStreetMap.
It takes two minutes to get started with contributing to OpenStreetMap. First, create a user account on openstreetmap.org. You can then immediately zoom to your neighborhood, hit the edit button, and get to work. We recommend that you also download the JOSM editor, which is needed for more in-depth editing.
Once you start JOSM, you can download an area of OpenStreetMap data, edit it, and then upload it. Whatever you do, it's crucial to add a descriptive commit message when uploading -- this is very helpful for other contributors to out figure the intent and context of an edit. Common first edits are adding street names to unnamed roads, fixing typos, and adding points of interest like a hospital or a gas station. Keep in mind that any information you add to OpenStreetMap must be observed fact or taken from data in the public domain -- so, for instance, copying street names from Google is a big no-no.
JOSM allows for quick tracing of satellite images. You can simply turn on a satellite layer and start drawing the outlines of features that can be found there such as streets, building foot prints, rivers, and forests. Using satellite imagery is a great way to create coverage fast. We've blogged before about how to do this. Here's a look at our progress tracing Brasilia in preparation for the OGP meetings:
OpenStreetMap contributions in Brasilia between April 5 and April 12.
In places where good satellite imagery isn't available, a GPS tracker goes a long way. OpenStreetMap offers a good comparison of GPS units. Whichever device you use, the basics are the same -- you track an area by driving or walking around and later load the data into JOSM, where you can clean it up, classify it, and upload it into OpenStreetMap.
Synchronizing your camera with the GPS unit.
For our survey in Brasilia, we used walking papers, which are simple printouts of OpenStreetMap that let you jot down notes on paper. This is a great tool for on-the-ground surveys to gather street names and points of interest. It's as simple as you'd imagine. You walk or drive around a neighborhood and write up information that you see that's missing in OpenStreetMap. Check out our report of our efforts doing this in Brasilia on our blog.
Walking papers for Brasilia.
For more details on how to contribute to OpenStreetMap, check out Learn OSM -- it's a great resource with step-by-step guides for the most common OpenStreetMap tasks. Also feel free to send us questions directly via @mapbox.
O'Reilly radar :: MIT's recent Civic Media Conference and the latest batch of Knight News Challenge winners made one reality crystal clear: as a new era of technology-fueled transparency, innovation and open government dawns, it won't depend on any single CIO or federal program. It will be driven by a distributed community of media, nonprofits, academics and civic advocates focused on better outcomes, more informed communities and the new news, whatever form it is delivered in.
Continue to read Alex Howard, radar.oreilly.com
OpenGov NYC is an exciting unconference to be held in New York City. This one day event will provide an opportunity for a variety of civicly engaged participants to foster conversations regarding the relation between participation, transparency, and efficiency.
This is third event in a series of annual events hosted by the Open NY Forum. These convening forums are a way of creating a positive and productive space for government workers, technologists, entrepreneurs, and citizens to come together and engage with one another.
The event will be held on Sunday, June 5, 10am to 6pm at CUNY Graduate School of Journalism in New York City.
The day will revolve around three primary questions:
This is a must attend for anyone passionate about technology, transparency, and government. Tickets are $15 and you can register here. Be sure to follow OpenGov Camp creators Open NY Forum on Twitter.
Open Data, tutti ne parlano, ma come si fa?
A questa domanda abbiamo provato a dare una prima risposta con la stesura di un manualetto: http://tinyurl.com/pendataitalia
L’auspicio è che questo umile lavoro a cui hanno contribuito alcuni dei nostri associati, possa essere un un riferimento per gli amministratori pubblici, i manager e tutti quei decisori che, convinti sulla bontà della filosofia che sorregge la disciplina dell’Open Data Government, non hanno ancora trovato la scatola degli attrezzi per passare dalla teoria alle azioni concrete.
Come tutte le scatole degli attrezzi, anche questa potrà essere riempita di nuovi strumenti e, grazie all’apporto di nuovi contributi, diventare un riferimento per dare finalmente anche all’Italia una strategia per il “governo digitale”.
Cosa vuol dire Open Data? Perché l’Open Data rappresenta una strada verso l’Open Government, e perché l’Open Government è uno strumento di sviluppo? Quali sono i principali problemi da affrontare quando si vuole “fare” Open Data”? Quali le tematiche giuridiche da tenere in considerazione? Quali gli aspetti tecnici e gli impatti organizzativi? A queste domande (ed a qualcuna in più) abbiamo voluto fornire una prima risposta, per consentire a tutti di iniziare a comprendere i motivi della centralità di questo tema per lo sviluppo del Paese.
Queste linee guida fanno seguito al Manifesto per l’Open Government, che la nostra associazione ha pubblicato a novembre dello scorso anno. Le prossime iniziative che contiamo di portare avanti grazie all’aiuto di un sempre più nutrito gruppi di esperti saranno annunciate nei prossimi giorni, nel corso di alcuni eventi ai quali stiamo lavorando.
Nel contempo, chiunque voglia contribuire a migliorare questa versione può commentare il post. Garantiamo, come sempre, la massima attenzione a tutte le osservazioni, critiche e proposte di miglioria che verranno inserite nella prossima versione.
Ringraziamo la rivista eGov che ha stampato le prime copie del presente manuale per consegnarle a tutti coloro che assisteranno alla premiazione di oggi a Palazzo Marino.
Come Si Fa Open Data – Versione 1.0
The Department of Education (DOE) recently launched Maps.ed.gov/Broadband an interactive map that shows schools and their proximity to broadband Internet access speeds across the country. This is an important story for DOE, an agency that has a stated goal that all students and teachers have access to a sufficient infrastructure for learning -- which nowadays includes a fast Internet connection. The map is based on open data released last month by the Federal Communications Commission (FCC). As you can see below, the result is a custom map that shows a unique story -- how schools' Internet access compares across the country.
In addition to being an example of an open data mashup, this map also serves as an example of what can be built with emerging open-source mapping tools. We worked with DOE to process and merge the two data sets, and then generated the new map tiles using Mapnik, an open-source toolkit for rendering map tiles. Then we created the custom overlay of schools and universities using TileMill, our open-source map design studio. Finally, a TileMill layer was added on top of the broadband data.
It is great to see both the DOE and FCC able to leverage open data to make smarter policy decisions. Karen Cator, the director of the office of educational technology at DOE has an awesome blog post about why this mashup matters:
"The Department of Education's National Education Technology Plan sets a goal that all students and teachers will have access to a comprehensive infrastructure for learning, when and where they need it," Cator writes. "Broadband access is a critical part of that infrastructure. This map shows the best data to date and efforts will continue to gather better data and continually refresh the maps."
There is a flaw in the investigative reporting model and it has to do with longevity. Follow me on this for a second: A reporter works months at a time scouring documents, meeting sources, verifying details, writing, and perhaps even building a database. And then the piece is published.
And that’s it.
The lifespan of investigative reporting, at least as it’s typically done through newspapers, can be disappointingly short given the painful labor and birthing process. Once stories are released, the hope is the public (or perhaps lawmakers) will pick up the torch to right the wrongs illuminated by reporters. But the drumbeat stops after a while. Reporters have to move on to new assignments, and the public’s desire to change laws and right wrongs can be overtaken by things like #Winning.
In an ambitious new project, the Center for Public Integrity, Public Radio International, and Global Integrity are trying to build a new mechanism that keeps the intensity and awareness of investigative reporting at a steady pace.
What they’re building is a fifty-state corruption risk index. Think of it like a Homeland Security threat level indicator that shows just how susceptible your state is to corruption. Already this is no small task: They plan to hire a reporter in each state to do ground-level reporting and compile information for the index as well write stories. Where they hope to transform the investigative reporting machine, though, is by going transparent and getting people invested in the project before it officially drops next year. Instead of holding onto information before the project is complete, they’ll invite the public in, ask for a little crowdsourcing, and build momentum — and a network. The goal is to make the corruption index something of a perpetual motion machine.
“The idea here is that in recent years really good, solid investigative reporting on the state level has fallen off, and state newspapers have had to make cutbacks,” said Caitlin Ginley, the project coordinator for CPI. “We see this as a great way to revitalize that.”
The corruption index is not without some precedent. In 2009, The Center for Public Integrity released States of Disclosure, a fifty-state ranking of financial disclosure laws for local legislators (and source for the map above). Ginley told me they wanted to build on that foundation for the corruption index, using financial disclosure laws, conflict-of-interest laws, FOIA regulations, lobbyist rules, and other accountability standards as indicators to gauge the likelihood of corruption.
“Reporters can take that information and see this is where [their state is] doing very poorly and report that out,” Ginley said.
In its role, Global Integrity will help by creating a methodology and guiding the analysis of the data that comes in. (Reporters will also be using Global Integrity’s Indaba tool to collect and publish information.) The end result will be much like States of Disclosure, with report cards and rankings, as well as background data from each state, Ginley said.
But the work that starts now, aside from the hiring of journalists in each state (JOB ALERT), is identifying people or organizations who can be helpful over the course of the corruption project.
“We have the tools now for people to get engaged in stories as they go along and that creates a lasting commitment so its not a one-shot deal,” said Michael Skoler, vice president of Interactive Media for PRI. Just as important as finding reporters and document-hounding is cultivating a community that can guide and assist the reporting, Skoler told me. (Skoler is familiar with the concept, having established the Public Insight Network while working for American Public Media.)
“The standard mode for investigative reporting is that people don’t talk at all about what they’re doing,” he told me.
PRI will work with its more than 800 partner stations to find expertise and build interest in the project over the next twelve months so that ideally, when the report is produced, there will be a built-in audience who can share it with others or try to minimize corruption in their state. Projects around government and budgets are ripe conditions for crowdsourcing, but Skoler thinks the crowdsourcing concept is something far too many people attempt but ultimately don’t understand.
“I think one of the misconceptions about crowdsourcing is when you crowdsource, you’re trying to attract and engage everyone. And that doesn’t work,” Skoler said. “Crowdsourcing is about reaching out to the people who are naturally interested and knowledgeable about something and inviting them to play.”
Within each state, he points out, there are honest government/open government groups, think tanks, academics, and non-profits who have an interest in state corruption and could assist in the project. Skoler thinks approaching these specific people and groups, unlike asking the general readership for help, could produce better results.
He also things that approach could help to increase the reach of investigative reporting. Instead of hoping that the results a reporter produces will automatically take on a life of their own, the corruption index hopes to apply strategy to extending the shelf life of accountability journalism. As Skoler puts it, “It’s a new way of thinking about impact for investigative journalism — and about building impact in through a whole process.”
Hi everyone,
As part of the global SAP Corporate Social Responsibility team, I am responsible for managing our worldwide Technology Donation program that provides free reporting and data visualizaton tools to over 900 non-profts each year in 15 countries. We have been partnering with TechSoup for quite a while now and am excited about the many possibilites to engage.
I am most interested in building capacity in the non-profit sector through technology. At SAP we can bring a wide experience in business management along with the skills of 60,000 employees aroud the world that want to contribute. We also have a large developer ecosystem part of the SAP Community Network. We have also been supporting interesting work around impact measurement for non-profits and social enterprises through the Demonstrating Value Project
What I am most interested in from collaborators is understanding how the different pieces of technology (hardware, networking, different software systems) can be integrated and easily consumed by non-profits. I'm also interested in going beyond traditional training on specific applications to helping organizations create strategies and build operational systems that can deliver better results. Finally I want to learn from, and share with, colleagues best practices on engaging employees with technology donations and how to embed these practices into the business so that CSR programs are not "off to the side" but a core part of operations.
You can find me on twitter @constructive and my (infrequently updated) blog at http://www.constructive.net
When I entered Stamen's offices in the Mission district of San Francisco, I saw four people gathered around a computer screen. What were they doing? Nothing less than "mapping the world" -- not as it appears in flat dimension, but how it reveals itself. And they weren't joking. Stamen, a data visualization firm, has always kept "place" central to many of their projects. They achieved this most famously through their crimespotting maps of Oakland and San Francisco, which give geographical context to the world of crime. This week they are taking on a world-sized challenge as they host a conference that focuses on cities, interactive mapping, and data.
As part of a Knight News challenge grant, this conference is part of Stamen's Citytracking project, an effort to provide the public with new tools to interact with data as it relates to urban environments. The first part of this project is called dotspotting, and is startling in its simplicity. While still in early beta stage, this project aims at creating a baseline map by imposing linkable dots on locations to yield data sets. The basic idea is to strike a balance between the free, but ultimately not-yours, nature of Google Maps and the infinitely malleable, but overly nerdy, open-source stacks that are out there.
With government agencies increasingly expected to operate within expanded transparency guidelines, San Francisco passed the nation's first open data law last fall, and many other U.S. cities have started to institutionalize this type of disclosure. San Francisco's law is basic and seemingly non-binding. It states that city departments and agencies "shall make reasonable efforts" to publish any data under their control, as long as the data does not violate other laws, in particular those related to privacy. After the law passed unanimously by the Board of Supervisors (no small feat in this terminally fractious city), departments have been uploading data at a significant rate to our data clearinghouse website, datasf. While uploading data to these clearinghouses is the first step, finding ways to truly institutionalize this process has been challenging.
Why should we care about open data? And why should we want to interact with it?
While some link the true rise of open data movement with the most recent recession, the core motivation behind this movement has always been inherent to the nature of a citizenry. Behind this movement is active citizenship. Open data in this sense can mean the right to understand the social, cultural, and societal forces constantly in play around us. As simultaneously the largest consumers and producers of data, cities have the responsibility to engage their citizens with this information. Gabriel Metcalf, executive director of SPUR (San Francisco Planning and Urban Research), and I wrote more about this, in our 2010, year in review guide.
Stamen's Citytracking project wants to make that information accessible to more than just software developers but at a level of sophistication that simultaneously allows for real analysis and widespread participation. Within the scope of this task, Stamen is attempting to converge democracy, technology, and design.
Why is this conference important?
Data and Cities brings together city officials, data visualization experts, technology fiends, and many others who fill in the gaps between these increasingly related fields.
Stamen has also designed this conference to have a mixture of formats, from practical demonstrations, to political discussions, and highly technical talks.
According to Eric Rodenbeck, Stamen's founder and CEO, "This is an exciting time for cities and data, where the literacy level around visualization seems to be rising by the day and we see huge demand and opportunity for new and interesting ways for people to interact with their digital civic infrastructure. And we're also seeing challenges and real questions on the role that cities take in providing the base layer of services and truths that we can rely on. We want to talk about these things in a setting where we can make a difference."
Data and Cities will take place February 9 - 11 and is invitation-only. In case you haven't scored an invitation, I'll be blogging about it all week.
Selected Speakers:
Jen Pahlka from Code for America - inserting developers into city IT departments across the country to help them mine and share their data.
Adam Greenfield from http://urbanscale.org/ and author of Everyware. Committed to applying the toolkit and mindset of interaction design to the specific problems of cities.
Jay Nath, City of San Francisco
http://www.jaynath.com/2010/12/why-sf-should-adopt-creative-commons, http://datasf.org
Anche in Italia c’è molto interesse per le tematiche relative all’Open Government e i tanti riscontri che abbiamo avuto in questa prima settimana di “scrittura collaborativa” del Manifesto ne sono una prova.
Spesso, però, il termine Open Government suscita diffidenza: sarà l’inglese (con cui gli italiani non hanno molta dimestichezza), sarà la paura che si tratti solo di un’etichetta, di una moda passeggera destinata ad essere dimenticata tra pochi mesi.
Qualche giorno fa sono stato invitato a parlarne dai Radicali Italiani e, in un breve intervento che riporto qui sotto, ho cercato di spiegare cosa si nasconde dietro questa espressione e perché – a mio avviso – non si tratta di una moda passeggera.
Voi cosa ne pensate?
Open Government: un nuovo paradigma from Ernesto Belisario on Vimeo.
Caro collega che lavori nella PA italiana,
forse ti stai chiedendo se ci sia veramente bisogno di un nuovo manifesto, o se non basterebbe invece applicare quella montagna di leggi e direttive che giacciono inerti e abbandonate sugli scaffali della PA.
Io credo che innanzitutto sia necessario condividere molte idee insieme a quelli come noi che operano nelle amministrazioni pubbliche, e poiché le norme abitualmente si subiscono, è forse più utile partire dalla convergenza su un pensiero attivo e positivo per inseguire l’obiettivo di un governo aperto sul serio.
Se si comprende un’idea e i valori che si porta dietro, è più facile individuare la strada per applicarla concretamente, a prescindere dagli aspetti propriamente legislativi. Serviranno anche quelli, certamente, ma il punto di partenza è legato alla visione che si ha del ruolo dell’amministrazione rispetto ai cittadini, alle imprese, alla rete.
Sì, la rete, o forse dovremmo scrivere “la Rete”, che non è solo connettività, bit, informazioni e servizi. La Rete con la R maiuscola è qualcosa che genera un valore che prima non c’era, abilitato dalle persone e soprattutto dalle applicazioni create dalle persone. Ma questa straordinaria opportunità ha bisogno di un terreno fertile che ancora in Italia non c’è.
Questa è la prima cosa che dovremo condividere, caro collega, quali sono le “sostanze indispensabili” affinché il nostro terreno sia fertile sul serio.
Quando rendiamo disponibili in formati accessibili informazioni e servizi su Internet, noi non facciamo che una parte del nostro dovere. E’ vero, lo ammetto, stiamo rispettando le indicazioni e i vincoli del CAD, della legge Stanca e dell’ultima direttiva sui siti web delle PA. E probabilmente molti cittadini già saranno soddisfatti così, e così pure gran parte dei burocrati che ci circondano.
Ma possiamo fare molto di meglio. Abbiamo la possibilità di introdurre la nostra piccola quota di fertilizzante nel suolo della rete, in attesa che un seme più intelligente di altri pianti le sue radici succhiando informazioni lasciate lì da noi. Come? ti chiederai. Trattando l’informazione come se fosse un’infrastruttura di base dell’economia immateriale (art 4 del manifesto).
E’ semplice e non è nemmeno troppo dispendioso, ma presuppone una consapevolezza che ancora è poco diffusa. Le questioni da comprendere sono fondamentalmente quattro: i formati, le licenze, l’aggiornamento e l’accesso.
Per fare un buon lavoro dovremo rendere disponibili i dati della nostra amministrazione usando dei formati aperti, adatti ad essere interpretati dal software e non solo dagli umani. Per fare un esempio, se pubblichiamo l’elenco delle aree Wi Fi pubbliche presenti nella nostra città, con la collocazione delle antenne sul territorio, non è sufficiente che presentiamo una bella immagine della mappa delle installazioni. Questa infatti sarà un’utile informazione per i turisti e per gli abitanti, ma non sarà riutilizzabile in alcun modo dal software. Ma se noi associassimo alla mappa anche una banale tabella dove presentiamo gli stessi dati fornendo indirizzi o coordinate geografiche, usando un formato tipo CSV, quello che si esporta anche da un banale foglio di Excell, ecco che la nostra informazione diventerebbe terreno fertile. Un’impresa o un appassionato potrebbero raccogliere tutti i dati di questo tipo delle città italiane e costruire una mappa sempre aggiornata delle aree Wi Fi pubbliche del paese. O potrebbero connettere i dati sulla presenza di antenne radio con l’insorgenza di determinati disturbi nei residenti di una specifica area, sempre se l’azienda sanitaria locale rendesse disponibili questi dati.
Si comprende subito come l’aggiornamento periodico di queste informazioni rappresenti uno degli elementi chiave insieme alla facile accessibilità attraverso la rete. Sarebbe sufficiente creare uno specifico spazio web dove raccogliere tutti i dati “liberati” della nostra amministrazione e attendere che qualche seme vada ad attecchire.
Manca ancora un elemento però, ma è fondamentale. Di chi sono i dati che andremo a rendere disponibili? Come fanno un cittadino, un’impresa, un’associazione a sapere che li possono usare liberamente? Dovremo accompagnare questi dati con una licenza che ne consenta l’utilizzo nel totale rispetto della legge. E’ facile, è un problema che a livello internazionale è già stato affrontato con successo ed ora ci sono anche delle formule italiane semplici e praticabili, come la Italian Open Data License v1.0. (Per un approfondimento sulle licenze vedi il post di Ernesto Belisario) Non resta quindi che scrivere chiaramente quale licenza utilizziamo.
Abbiamo parlato in precedenza di economia immateriale della rete, ovvero possiamo ipotizzare che oltre a generare valore pubblico qualcuno ci potrebbe pure far soldi. Sarebbe un problema? Francamente credo proprio di no. Se i dati pubblici sono in rete e qualcuno li usa intelligentemente per estrarvi qualcosa di utile, ben venga, saranno il mercato e soprattutto le persone a decidere se vale la pena di spendere o meno per quel prodotto (art 5).
A questo punto ti starai chiedendo dove sta il trucco, se tutto è così semplice allora perché non lo abbiamo già fatto. Ecco, la risposta è semplice, dobbiamo ancora metabolizzare il concetto che sta alla base di tutto questo agire concretamente: occorre un nuovo modello di trasparenza (art 3). Oggi quello che ci dicono le norme è: “devi rendere trasparente tutto quello che è elencato di seguito”. È un elenco lungo, con le ultime disposizioni le cose sono tante, è vero, ma ciò di cui stiamo parlando è completamente diverso. L’idea che propone il manifesto potrebbe essere riassunta così: “rendi trasparente tutto quello che non è espressamente proibito dalla legge nell’intento di tutelare della privacy dei cittadini, e usa tutte le risorse a tua disposizione per informarli e coinvolgerli perché l’intelligenza collettiva può regalarti grandi risultati” (art 6 e 7).
Attento però, caro collega, se senti di condividere alcune di queste idee, sappi che potrebbe venirti in mente di maneggiare anche materiale che qualcuno considera invece scomodo o non adatto. Non tu, ovviamente, ma tanti amministratori e burocrati che ritengono sia meglio che certe cose “non si sappiano troppo in dettaglio”: i dati ambientali, l’inquinamento, il rumore, le spese, gli investimenti e così via. Continueranno quindi a chiederti di pubblicare sul web brillanti report e comunicati stampa, corposi PDF da scaricare pensando che il frutto delle loro accurate elaborazioni sia in grado di soddisfare il requisito della trasparenza.
Beh, oggi non è più così, e noi tutti stiamo qui condividendo l’idea che in questa rete sempre più popolata di applicazioni accoglieremo con piacere i frutti dell’intelligenza collettiva se saranno in grado di semplificarci la vita e ci faranno incidere meglio sulla società che abitiamo.
The Guardian takes data journalism seriously. They obtain, format, and publish journalistically interesting data sets on their Data Blog, they track transparency initiatives in their searchable index of world government data, and they do original research on data they’ve obtained, such as their amazing in-depth analysis of 90,000 leaked Afghanistan war documents. And they do most of this with simple, free tools.
Data Blog editor Simon Rogers gave me an action-packed interview in The Guardian’s London newsroom, starting with story walkthroughs and ending with a philosophical discussion about the changing role of data in journalism. It’s a must-watch if you’re wondering what the digitization of the world’s facts means for a newsroom. Here’s my take on the highlights; a full transcript is below.
The technology involved is surprisingly simple, and mostly free. The Guardian uses public, read-only Google Spreadsheets to share the data they’ve collected, which require no special tools for viewing and can be downloaded in just about any desired format. Visualizations are mostly via Many Eyes and Timetric, both free.
Data Blog posts are often related to or supporting of news stories, but not always. Rogers sees the publishing of interesting data as a journalistic act that stands alone, and is clear on where the newsroom adds value:
I think you have to apply journalistic treatment to data. You have to choose the data in a selective, editorial fashion. And I think you have to process it in a way that makes it easy for people to use, and useful to people.
The Guardian curates far more data than it creates. Some data sets are generated in-house, such as its yearly executive pay surveys, but more often the data already exists in some form, such as a PDF on a government web site. The Guardian finds such documents, scrapes the data into spreadsheets, cleans it, and adds context in a Data Blog post. But they also maintain an index of world government data which scrapes open government web sites to produce a searchable index of available data sets.
“Helping people find the data, that’s our mission here,” says Rogers. “We want people to come to us when they’re looking for data.”
In alignment with their open strategy, The Guardian encourages re-use and mashups of their data. Readers can submit apps and visualizations that they’ve created, but data has proven to be just as popular with non-developers — regular folks who want the raw information.
Sometimes readers provide additional data or important feedback, typically through the comments on each post. Rogers gives the example of a reader who wrote in to say that the Academy schools listed in his area in a Guardian data set were in wealthy neighborhoods, raising the journalistically interesting question of whether wealthier schools were more likely to take advantage of this charter school-like program. Expanding on this idea, Rogers says,
What used to happen is that we were the kind of gatekeepers to this information. We would keep it to ourselves. So we didn’t want our rivals to get ahold of it, and give them stories. We’d be giving stories away. And we wouldn’t believe that people out there in the world would have any contribution to make towards that.
Now, that’s all changed now. I think now we’ve realized that actually, we’re not always the experts. Be it Doctor Who or Academy schools, there’s somebody out there who knows a lot more than you do, and can thus contribute.
So you can get stories back from them, in a way…If you put the information out there, you always get a return. You get people coming back.
Perhaps surprisingly, data also gets pretty good traffic, with the Data Blog logging a million hits a month during the recent election coverage. “In the firmament of Guardian web sites that’s not bad. That’s kind of upper tier,” says Rogers. “And this is only after being around for a year.” (The even younger Texas Tribune also finds its data pages popular, accounting for a third of total page views.)
Rogers and I also discussed the process of getting useful data out of inept or uncooperative governments, the changing role of data specialists in the newsroom, and how the Guardian tapped its readers to produce the definitive database of Doctor Who villains. Here’s the transcript, lightly edited.
JS: All right. So. I’m here with Simons Rogers in the Guardian newsroom in London, and you’re the editor of the Data Blog.
SR: That’s right, and I’m also a news editor so I work across the organization on data journalism, essentially.
JS: So, first of all, can you tell us what the Data Blog is?
SR: Ok, well basically it came about because, as I said I was a news editor working a lot with graphics, and we realized we were just collecting enormous amounts of data. And we though, well wouldn’t our readers be interested in seeing that? And when the Guardian Open Platform launched, it seemed a good time to think about opening up– we were opening up the Guardian to technical development, so it seemed a good time to open up our data collections as well.
And also it’s the fact that increasingly we’ve found people are after raw information. If you looked– and there’s lots of raw information online, but if you start searching for that information you just get bewildering amounts of replies back. If if you’re looking for, say, carbon emissions, you get millions of entries back. So how do you know what the right set of data is? Whereas we’ve already done that set of work for our readers, because we’ve had to find that data, and we’ve had to choose it, and make an editorial selection about it, I suppose. So we thought we were able to cut out the middle man for people.
But also we kind of thought when we launched it, actually, what we’d be doing is creating data for developers. There seemed to be a lot of developers out there at that point who were interested in raw information, and they would be the people who would use the data blog, and the open platform would get a lot more traffic.
And what actually happened, what’s been interesting about it, is that– what’s actually happened is that it’s been real people who have been using the Data Blog, as much as developers. Probably more so than developers.
JS: What do you mean “real people”?
SR: Real people, I suppose what I mean is that, somebody who’s just interested in finding out what a number is. So for instance, here at the moment we’ve got a big story about a government scheme for building schools, which has just been cut by the new government. It was set up by the old government, who invested millions of pounds into building new school buildings. And so, we’ve got the full list of all the schools, but the parliamentary constituency that they’re in, and where they are and what kind of project they were. And that is really, really popular today, that’s one of our biggest things, because there’s a lot of demonstrations about it, it’s a big issue of the day. And so I would guess that 90% of people looking at it are just people who want to find out what the real raw data is.
And that’s the great thing about the internet, it gives you access to the raw, real information. And I think that’s what people really crave. They want the interpretation and the analysis from people, but they also want the veracity of seeing the real thing, without having it aggregated or put together. They just want to see the raw data.
JS: So you publish all of the original numbers that you get from the government?
SR: Well exactly. The only time– with the Data Blog, I try to make it as newsy as possible. So it’s often hooked around news stories of the day. Partly because it helps the traffic, and you’re kind of hooking on to existing requirements.
Obviously we do– it’s just a really eclectic mix of data. And I can show you the screen, for a sec.
JS: All right. Let’s see something.
SR: Okay, so this is the data blog today. So obviously we’ve got Afghanistan at the top. Afghanistan is often at the top at the moment. This is a full list of everybody who’s died, every British casualty who’s died and been wounded over time. So you’ve got this data here. We use, I tend to use a lot of third party services. This is a company called Timetric, who are very good at visualizing time series data. It takes about five minutes to create that, and you can roll over and get more information.
JS: So is that a free service?
SR: Yeah, absolutely free, you just sign up, and you share it. It works a bit like Many Eyes, you know the IBM service.
JS: Yeah.
SR: We’ll embed these Google docs. We use Google docs, Google spreadsheets to share all our information because it’s very for people to download it. So say you want to download this data. You click on the link, and it will take you through in a second to, there you go, it’s the full Google spreadsheet. And you’ve got everything on here. You’ve got, these are monthly totals, which you can’t get anywhere else, because nobody else does that information.
JS: What do you mean nobody else does it?
SR: Well nobody else bothers to put it together month by month. You can get totals by year from, iCasualties I think do it, but we’ve just collected some month by month, because often we’ve had to draw graphics where it’s month by month. It’s the kind of thing, actually it’s quite interesting to be able to see which month was the worst for casualties.
We’ve got lists of names, which obviously are in a few places. We collect Afghanistan wounded statistics which are terribly confused in the UK, because what they do is they try and make them as complicated as possible. So, the most serious ones, NOTICAS is where your next of kin is notified. That’s a serious event, but also you’ve got all those people evacuated. So anyway, this kind of data. We also keep amputation data, which is a new set that the government refused to release until recently, and a Guardian reporter was instrumental in getting this data released. So we kind thought, maybe we should make this available for people.
So you get all this data, and then what you can do, if you click on “File” there, you can download it as Excel, XML, CSV, or whatever format you want. So that’s why we use Google speadsheets. It’s the kind of thing that’s a very, very easily accessible format for people.
So really what we do is we try and encourage a community, a community to grow up around data and information. So every post has got a talk facility on it.
Anyway, going through it. So this is today’s Data Blog, where you’ve got Afghanistan, Academy schools in the UK. The schools are run by the state, pretty much.
JS: So just to clarify this for the American audience, what’s an Academy school?
SR: Ok, well basically in the UK most schools are state schools, that most children go to. State schools are, we all pay for them, they’re paid for out of our taxes. And they’re run at a local level, which obviously has it’s advantages because it means that you are, kind of, working to an area. What the new government’s proposing to do is allow any school that wants to to become an Academy. And what an Academy is is a school that can run its own finances, and own affairs.
And what we’ve got is we’ve got the data, the government’s published the data — as a PDF of course because governments always publish everything as a PDF, in this country anyway — and what they give you, which we’ve scraped here, is a list of every school in the UK which has expressed an interest. So you’ve got the local authority here, the name of the school, type of school, the address, and the post code. Which is great, because that’s good data, and because it’s on a PDF we can get that into a spreadsheet quite easily.
JS: So did you have to type in all of those things from a PDF, or cut and paste them?
SR: Good god no. No, no, we have, luckily we’ve got a really good editorial support team here, who are, thanks to the Data Blog, are becoming very experienced at getting data off of PDFs. Because every government department would much rather publish something as a PDF, so they can act as if they’re publishing the data but really it’s not open.
JS: So that’s interesting, because in the UK and the US there’s this big government publicity about, you know, we’re publishing all this data.
SR: Absolutely.
JS: But you’re saying that actually–
SR: It’s not 100 percent yet. So, I’ll show you in a second that what they tend to do is just publish– most government departments still want to publish stuff as PDFs. They can’t quite get out of that thing. Or want to say, why would somebody want a spreadsheet? They don’t really get it. A lot of people don’t get it.
And, we wanted the spreadsheet so you can do stuff like this, which is, this is a map of schools interested in becoming Academies by area. And so because we have that raw data in spreadsheet form we can work out how many in the area. You can see suddenly that this part of England, Kent, has 99 schools, which is the biggest in the country. And only one area, which is Barking, up here, in London, which is, sorry, is down here in London, but anyway that has no schools applying at all.
And the government also always said that at the beginning that it would mainly be schools which weren’t “outstanding” would apply. But actually if you look at the figures, which again, we can do, the majority of them are outstanding schools. So they’re already schools which are good, which are applying to become academies. Which kind of isn’t the point. But that kind of analysis, that’s data journalism in a sense. It’s using the numbers to get a story, and to tell a story.
JS: And how long did that story take you to put together? To get the numbers, and do the graphics, and…?
SR: Well, I was helped a bit, because I got, I’ve had one of my helpers who works in editorial support to get the data onto a spreadsheet. And in terms of creating the graphic we have a fantastic tool here, which is set up by one of our technical development team who are over there, and what it does, is it allows you to paste a load of data, geographic data, into this box, and you tell it what kind, is it parliamentary constituency, or local authority, or educational authority, or whatever, however the different regional differentiations we have in the UK, and it will draw a map for you. So this map here was drawn by computer, basically, and then one of the graphics guys help sort out the labels and finesse it and make it look beautiful. But it saves you the hard work of coloring up all those things. So actually that took me maybe a couple of hours. In total.
JS: How about getting the data, how long did that take?
SR: Oh well luckily that data– you know the government makes the data available. But like I say, as a PDF file. So this is the government site, and that’s the list there, and you open it, it opens as a PDF. Because we’ll link to that.
But luckily the guys in the ESD [editorial services department] are very adept now, because of the Data Blog, at getting data into spreadsheets. So, you know they can do that in 20 minutes.
JS: So how many people are working on data overall, then?
SR: Well, in terms of– it’s my full time job to do it. I’m lucky in that I’ve got an awful lot of people around here who have got an interest who I can kind of go and nudge, and ask. It’s a very informal basis, and we’re looking to formalize that, at the moment. We’re working on a whole data strategy, and where it goes. So we’re hoping to kind of make all of these arrangements a bit more formal. But at the moment I have to fit into what other people are doing. But yeah, we’ve got a good team now that can help, and that’s really a unique thing.
So I was going through the Data Blog for you. So this is a typical, a weird day, so schools, and then we’ve got another schools thing because it’s a big schools day today. This is school building projects scrapped by constituency, full list. Now, this is another where the government didn’t make the data easily available. The department for education published a list of all the school projects that were going to be stopped when the government cut the funding, some of which is going towards creating Academy schools, which is why this is a bit of an issue in the country at the moment. And we want to know by constituency how it was working. So which MPs were having the most school projects cut, in their constituency. And we couldn’t get that list out of the department of education, but one MP had lodged it with the House of Commons library. So we managed to get it from the House of Commons library. But it didn’t come in a good form, it came in a PDF again, so again we had to get someone from tech to sort it out for us.
But the great thing is that we can do something like this, which is a map of projects stopped by constituency, by MP. And most of the projects we’ve stopped were in Labour seats. As you know Labour are not in power at the moment. So we can do some of this sort of analysis which is great. So there were 418 projects stopped in Labour constituent seats, and 268 stopped in conservative seats. So basically 40% of Labour MPs had a project stopped, at least one project stopped in their seat, compared to only 27% of Conservatives, and 24% of the Dems who are in power at the moment.
JS: So would it be accurate to say the data drove this story, or showed this story, or…?
SR: Data showed this story, which is great, but the one thing, the caveat — of course, the raw numbers are never 100% — the caveat was there were more projects going on in Labour areas because Labour government, previous government which is Labour set up the projects, and they gave more projects to Labour areas. So you can read it either way.
JS: And you said this in the story?
SR: We said this in the story. Absolutely. We always try and make the caveats available for people. So that’s a big story today, because of there are demonstrations about it in London. You’ve come to us on a very education-centered day today.
But there’s other stuff on the blog too. This is a very British thing. We did this because we thought it would be an interesting project to do. I had somebody in for a week and they didn’t have much to do so I got them to make a list of every Doctor Who villain ever.
JS: This was an intern project?
SR: This was an intern project. We kinda thought, yeah, we’ll get a bit of traffic. And we’ve never had so much involvement in a single piece ever. It’s had 500 retweets, and when you think most pieces will get 30 or 40, it’s kind of interesting. The traffic has been through the roof. And the great thing is, so we created–
JS: Ooh, what’s this? This is good.
SR: It’s quite an easy– we use ManyEyes quite a lot, which is very very quick to create lovely little graphics. And this is every single Doctor Who villain since the start of the program, and how many times they appear. So you see the Daleks lead the way in Doctor Who.
JS: Yeah, absolutely.
SR: Followed by the Cybermen, and the Masters in there a lot. And there are lots of other little things. But we started off with about 106 villains in total, and now we’re up to– we put it out there and we said to people, we know this isn’t going to be the complete list, can you help us? And now we’ve got 212. So my weekend has basically been– I’ll show you the data sheet, it’s amazing. You can see the comments are incredible. You see these kinds of things, “so what about the Sea Devils? The Zygons?” and so on.
And I’ll show you the data set, because it’s quite interesting. So this is the data set. Again Google docs. And you can see over here on the right hand side, this is how many people looking at it at any one time. So at that moment there are 11 people looking on. There could be 40 or 50 people looking at any one moment. And they’re looking and they’re helping us make corrections.
JS: So, wait– this data set is editable?
SR: No, we haven’t made it editable, because we’ve had a bad experience people coming to editable ones and mucking around, you know, putting swear words on stuff.
JS: So how do they help you?
SR: Well they’ll put stuff in the comments field and I’ll go in and put it on the spreadsheet. Because I want a sheet that people can still download. So now we’ve got, we’re now up to 203. We’ve doubled the amount of villains thanks to our readers. It’s Doctor Who. And it just shows we’re an eclectic– we’re a broad church on the Data Blog. Everything can be data. And that’s data. We’ve got number of appearances per villain, and it’s a program that people really care about. And it’s about as British as it’s possible to get. But then we also have other stuff too– and there we go, crashed again.
JS: Well let me just ask you a few questions, and take this opportunity to ask you some broader questions. Because we can do this all day. And I have. I’ve spent hours on your data blog because I’m a data geek. But let’s sort of bring it to some general questions here.
SR: Okay. Go for it.
JS: So first of all, I notice you have the Data Blog, you also have the world data index.
SR: Yes. Now the idea of that was that, obviously lots of governments around the world have started to open up their data. And around the time that the British government was– a lot of developers here were involved in that project — we started to think, what can we do around this that would help people, because suddenly we’ve got lots of sites out there that are offering open government data. And we thought, what if we could just gather them all together into one place. So you’ve got a single search engine. And that’s how we set up the world data search. Sorry to point you at the screen again.
JS: No that’s fine, that’s fine.
SR: Basically, so what we did, we started off with just Australia, New Zealand, UK and America. And basically what this site does, is it searches all of these open government data sites. Now we’ve got Australia, Toronto in Canada, New Zealand, the UK, London, California, San Francisco, and data.gov.
So say you search for “crime,” say you’re interested in crime. There you go. So you come back here, you see you’ve got results here from the UK, London, you’ve got results from data.gov in America, San Francisco, New Zealand and Australia. Say you’re interested in just seeing– you live in San Francisco and you’re only interested in San Francisco results. You’ve three results. And there you go, you click on that.
And you’re still within the Guardian site because what we’re asking people to do is help us rank the data, and submit visualizations and applications. So we want people to tell us what they’ve done with the data.
But anyway if you go and click on that, and you click on “download,” and it will start downloading the data for you. Or, what it will do is take you to the terms and conditions. We don’t bypass any T&Cs. The T&C’s come alongside. But you click on that, you agree to that, and then you get the data. So we really try and make it easy for people. There you go. And this is the crime incidence data. Very variable. This is great because it’s KML files, so if you wanted to visualize that you get really great information. It’s all sorts of stuff. Sometimes it’s CSVs.
JS: What’s a KML file?
SR: So, Google Earth.
JS: Okay.
SR: Sorry. So, it’s mapping, a mapping file straight away.
SR: Okay, so one of the things we ask people to do is to submit visualizations and applications they’ve produced. So for instance, London has some very very good open data. If you haven’t looked around the Data Store, it’s really worth going to. And one of these things they do is they provide a live feed of all the London traffic cameras. You can watch them live. And this is a lovely thing, because what somebody’s done is they’ve written an iPad application. So you can watch live TFL, Transport for London, traffic cameras on your iPad.
And you see that data set has been rated. A couple of people have gone in there and rated it. You’ve got a download button, the download is XML. So we try and help people around this data. And this is growing now. Every time somebody launches an open government data site we’re gonna put it on here, and we’re working on a few more at the moment. So we want it to be the place that people go to. Every time you Google “world government data” it pops up at the top, which is what you want. You want people who are just trying to compare different countries and don’t know where to start, to help them find a way through this maze of information that’s out there.
JS: So do you intend to do this for every country in the world?
SR: Every country in the world that launches an open government data site, we’ll whack it on here. And we’re working– at the moment there are about 20 decent open government data sites around the world. We’re picking those up. We’ve got on here now, how many have we got? One, two, three, four, five, six, seven, eight. We’ll have 20 on in the next couple of weeks. We’re really working through them at the moment.
And what this does is, it scrapes them. So basically, we don’t– for us it’s easy to manage because we don’t have to update these data sets all the time. The computer does that for us. But basically, what we do provide people with is context and background information, because you’re part of the data site there.
JS: So let me make sure I have this clear. So you’re not sucking down the actual data, you’re sucking down the list and descriptions of the data sets available?
SR: Absolutely. So we’re providing people, because basically we want it to be as updated as possible. We don’t– if we just uploaded onto our site, that would kind of be pointless, and it would mean it would be out of date. This way, if something pops up on data.gov and stays there, we’ll get it quick on here. We’ll help people find it. Helping people find the data, that’s our mission here. It’s not just generating traffic, it’s to help people find the information, because we want people to come to us when they’re looking for data.
JS: So, okay. You’ve talked about, it sounds like, two different projects. The Data Blog. where you collect and clean up and present data that you–
SR: That we find interesting. We’re selective.
JS: In the process of the Guardian’s newsgathering.
SR: Yeah, and just things that are interesting anyway. So the Doctor Who post that we were just looking at is just interesting to do. It’s not anything we’re going to do a story about. And often they’ll be things that are in the news, say that day, and I’ll think “oh that’s a good thing to put on the Data Blog.” So it could be crime figures, or it could be– and sometimes, the side effect of that is a great side effect because you end up with a piece in the paper, or a piece on the web site. But often it might be the Data Blog is the only place to get that information.
JS: And you index world government data sites.
SR: Yeah, absolutely.
JS: Does the Guardian do anything else with data?
SR: Yeah, well what we do is, we’re doing a lot of Guardian research with data. So what we want to do is give people a kind of way into that. So for instance, we do do a lot of data-based projects. So for instance we’re doing an executive pay survey of all the biggest companies, how much they pay their bosses and their chief executives. That has always been a thing the paper’s always done for stories. And now what we’ll do is we’ll make that stuff available– that data available for people. So instead of just raw data journalism, it’s quite old data journalism. We’ve been doing it for ten years. But we used to just call it a survey. Now it’s data journalism, because it’s getting stories out of numbers. So we’ll work with that, and we’ll publish that information for people to see. And there are a couple of big projects coming up this week, which I really can’t tell you about, but next week it will be obvious what they are.
JS: Probably by the time this goes up we’ll be able to link to them.
[Simon was referring to the Guardian's data journalism work on the leaked Afghanistan war logs, described in a thorough post on the Data Blog.]
SR: Yeah, I’ll mail you about them. But we’ve got now an area of expertise. So increasingly what I’m finding is that I’m getting people coming to me within The Guardian, saying, so we’ve got this spreadsheet, well how can I do this? So for instance that Academies thing we were just looking at, we were really keen to find out which areas were the most, where the most schools were, for the paper. The correspondent wanted to know that. So actually, because we’ve got this area of expertise now in managing data, we’re becoming kind of a go-to place within The Guardian, for journalists who are just writing stories where they need to know something, or they need to find some information out, which is an interesting side effect. Because it used to be that journalists were kind of scared of numbers, and scared of data. I really think that was the case. And now, increasingly, they’re trying to embrace that, and starting to realize you can get stories out of it.
JS: Well that’s really interesting. Let’s talk for a minute about how this applies to other newsrooms, because it’s– as you say, journalists have been traditionally scared of data.
SR: Yeah, absolutely. You could say they prided themselves, in this country anyway, they prided themselves on lack of mathematical ability. I would say.
JS: Which seems unfortunate in this era.
SR: Yeah, absolutely. Yeah, yeah, absolutely.
JS: But especially a lot of our readers are from smaller newsrooms, and so what kind of technical capability do you need to start tracking data, and publishing data sets?
SR: I think it’s really minimal. I mean, the thing is that actually, what we’re doing is really working with a basic, most of the time just basic spreadsheet packages. Excel or whatever you’ve got. Excel is easy to use, but it could be any package really. And we’re using Google spreadsheets, which again is widely available for people to do information. We’re using visualization tools which are again, ManyEyes or Timetric which are widely available and easy to use. I think what we’re doing is just bringing it together.
I think traditionally that journalists wouldn’t regard data journalism as journalism. It was research. Or, you know, how is publishing data– is that journalism? But I think now, what is happening is that actually, what used to happen is that we were the kind of gatekeepers to this information. We would keep it to ourselves. So we didn’t want our rivals to get ahold of it, and give them stories. We’d be giving stories away. And we wouldn’t believe that people out there in the world would have any contribution to make towards that. Now, that’s all changed now. I think now we’ve realized that actually, we’re not always the experts. Be it Doctor Who or Academy schools, there’s somebody out there who knows a lot more than you do, and can thus contribute. So you can get stories back from them, in a way. So we’re receiving the information much more.
JS: So you publish the data, and then other people build stories out of it, is that what you’re saying?
SR: Other people will let us know– well, we publish say, well that’s an interesting story, or this is a good visualization. We’ve published data for other people to visualize. We thought, that’s quite an interesting thing to mash it up with, we should do that ourselves. So there’s that thing, and there’s also the fact that if you put the information out there, you always get a return. You get people coming back.
So for instance the Academies thing today that we were talking about. We’ve had people come back saying, well I live in Derbyshire and I know that those schools are in quite wealthy areas. So we start to think, well is there a trend towards schools in wealthy areas going to this, and schools in poorer areas not going to this.
So it gives you extra stories or extra angles on stories you wouldn’t think of. And I think that’s part of it. And I think partly there’s just the realization that just publishing data in itself, because it’s interesting, is a journalistic enterprise. Because I think you have to apply journalistic treatment to that data. You have to choose the data in a selective, editorial fashion. And I think you have to process it in a way that makes it easy for people to use, and useful to people.
JS: So last question here, which is of course going to be on many editors’ and publishers’ minds.
SR: Sure.
JS: Let’s talk about traffic and money. How does this contribute to the business of The Guardian?
SR: Okay, it’s a new– it’s an experiment for us, but traffic-wise it’s been pretty healthy. We’ve had– during the election we were getting a million page impressions in a month. Which is not bad. On the Data Blog. Now, as a whole, out of the 36 million that The Guardian gets, it doesn’t seem like a lot. But actually, in the firmament of Guardian web sites that’s not bad. That’s kind of upper tier. And this is only after being around for a year.
So in terms of what it gives us, it gives the same as producing anything that produces traffic gives us. It’s good for the brand, and it’s good for The Guardian site. In the long run, I think that there is probably canny money to be made out of there, for organizations that can manage and interpret data. I don’t know exactly how, but I think we’d have to be pretty dumb if we don’t come up with something. I’d be very surprised. It’s an area where there’s such a lot of potential. There are people who don’t really know how to manage data and don’t really know how to organize data that– for us to get involved in that area. I really think that.
But also I think that just journalistically, it’s as important to do this as it is to write a piece about a fashion week or anything else we might employ a journalist to do. And in a way it’s more important, because if The Guardian is about open information, which– since the beginning of The Guardian we’ve campaigned for freedom of information and access to information, and this is the ultimate expression of that.
And we, on the site, we use the phrase “facts are sacred.” And this comes from the famous C. P. Scott who said that “comment is free,” which as you know is the name of our comment site, but “facts are sacred” was the second part of the saying. And I kinda think that is– you can see it on the comment site, there you go. “Comment is free, but facts are sacred.” And that’s what The Guardian’s about. I really think that, you know, this says a lot about the web. Interestingly, I think that’s how the web is changing, in the sense that a few years ago it was just about comment. People wanted to say what they thought. Now I think it’s, increasingly, people want to find out what the facts are.
JS: All right, well, thank you very much for a thorough introduction to The Guardian’s data work.
SR: Thanks a lot.
Data Blog posts are often related to or supporting of news stories, but not always. Rogers sees the publishing of interesting data as a journalistic act that stands alone, and is clear on where the newsroom adds value:
I think you have to apply journalistic treatment to data. You have to choose the data in a selective, editorial fashion. And I think you have to process it in a way that makes it easy for people to use, and useful to people.
The Guardian curates far more data than it creates. Some data sets are generated in-house, such as the Guardian’s yearly executive pay surveys, but more often the data already exists in some form, such as a PDF on a government web site. The Guardian finds such documents, scrapes the data into spreadsheets, cleans it, and adds context in a Data Blog post. But they also maintain an index of world government data which scrapes open government web sites to produce a searchable index of available data sets.
“Helping people find the data, that’s our mission here,” says Rogers. “We want people to come to us when they’re looking for data.”
In alignment with their open strategy, The Guardian encourages re-use and mashups of their data. Readers can submit apps and visualizations that they’ve created, but data has proven to be just as popular with non-developers — regular folks who want the raw information.
Sometimes readers provide additional data or important feedback, typically through the comments on each post. Rogers gives the example of a reader who wrote in to say that the Academy schools listed in his area in a Guardian data set were in wealthy neighborhoods, raising the journalistically interesting question of whether wealthier schools were more likely to take advantage of this charter school-like program. Expanding on this idea, Rogers says,
What used to happen is that we were the kind of gatekeepers to this information. We would keep it to ourselves. So we didn’t want our rivals to get ahold of it, and give them stories. We’d be giving stories away. And we wouldn’t believe that people out there in the world would have any contribution to make towards that.
Now, that’s all changed now. I think now we’ve realized that actually, we’re not always the experts. Be it Doctor Who or Academy schools, there’s somebody out there who knows a lot more than you do, and can thus contribute.
So you can get stories back from them, in a way. … If you put the information out there, you always get a return. You get people coming back.
Perhaps surprisingly, data also gets pretty good traffic, with the Data Blog logging a million hits a month during the recent election coverage. “In the firmament of Guardian web sites that’s not bad. That’s kind of upper tier,” says Rogers. “And this is only after being around for a year.” (The even younger Texas Tribune also finds its data pages popular, accounting for a third of total page views.)
Rogers and I also discussed the process of getting useful data out of inept or uncooperative governments, the changing role of data specialists in the newsroom, and how the Guardian tapped its readers to produce the definitive database of Doctor Who villains. Here’s the transcript, lightly edited.
JS: All right. So. I’m here with Simons Rogers in the Guardian newsroom in London, and you’re the editor of the Data Blog.
SR: That’s right, and I’m also a news editor so I work across the organization on data journalism, essentially.
JS: So, first of all, can you tell us what the Data Blog is?
SR: Ok, well basically it came about because, as I said I was a news editor working a lot with graphics, and we realized we were just collecting enormous amounts of data. And we though, well wouldn’t our readers be interested in seeing that? And when the Guardian Open Platform launched, it seemed a good time to think about opening up– we were opening up the Guardian to technical development, so it seemed a good time to open up our data collections as well.
And also it’s the fact that increasingly we’ve found people are after raw information. If you looked– and there’s lots of raw information online, but if you start searching for that information you just get bewildering amounts of replies back. If if you’re looking for, say, carbon emissions, you get millions of entries back. So how do you know what the right set of data is? Whereas we’ve already done that set of work for our readers, because we’ve had to find that data, and we’ve had to choose it, and make an editorial selection about it, I suppose. So we thought we were able to cut out the middle man for people.
But also we kind of thought when we launched it, actually, what we’d be doing is creating data for developers. There seemed to be a lot of developers out there at that point who were interested in raw information, and they would be the people who would use the data blog, and the open platform would get a lot more traffic.
And what actually happened, what’s been interesting about it, is that– what’s actually happened is that it’s been real people who have been using the Data Blog, as much as developers. Probably more so than developers.
JS: What do you mean “real people”?
SR: Real people, I suppose what I mean is that, somebody who’s just interested in finding out what a number is. So for instance, here at the moment we’ve got a big story about a government scheme for building schools, which has just been cut by the new government. It was set up by the old government, who invested millions of pounds into building new school buildings. And so, we’ve got the full list of all the schools, but the parliamentary constituency that they’re in, and where they are and what kind of project they were. And that is really, really popular today, that’s one of our biggest things, because there’s a lot of demonstrations about it, it’s a big issue of the day. And so I would guess that 90% of people looking at it are just people who want to find out what the real raw data is.
And that’s the great thing about the internet, it gives you access to the raw, real information. And I think that’s what people really crave. They want the interpretation and the analysis from people, but they also want the veracity of seeing the real thing, without having it aggregated or put together. They just want to see the raw data.
JS: So you publish all of the original numbers that you get from the government?
SR: Well exactly. The only time– with the Data Blog, I try to make it as newsy as possible. So it’s often hooked around news stories of the day. Partly because it helps the traffic, and you’re kind of hooking on to existing requirements.
Obviously we do– it’s just a really eclectic mix of data. And I can show you the screen, for a sec.
JS: All right. Let’s see something.
SR: Okay, so this is the data blog today. So obviously we’ve got Afghanistan at the top. Afghanistan is often at the top at the moment. This is a full list of everybody who’s died, every British casualty who’s died and been wounded over time. So you’ve got this data here. We use, I tend to use a lot of third party services. This is a company called Timetric, who are very good at visualizing time series data. It takes about five minutes to create that, and you can roll over and get more information.
JS: So is that a free service?
SR: Yeah, absolutely free, you just sign up, and you share it. It works a bit like Many Eyes, you know the IBM service.
JS: Yeah.
SR: We’ll embed these Google docs. We use Google docs, Google spreadsheets to share all our information because it’s very for people to download it. So say you want to download this data. You click on the link, and it will take you through in a second to, there you go, it’s the full Google spreadsheet. And you’ve got everything on here. You’ve got, these are monthly totals, which you can’t get anywhere else, because nobody else does that information.
JS: What do you mean nobody else does it?
SR: Well nobody else bothers to put it together month by month. You can get totals by year from, iCasualties I think do it, but we’ve just collected some month by month, because often we’ve had to draw graphics where it’s month by month. It’s the kind of thing, actually it’s quite interesting to be able to see which month was the worst for casualties.
We’ve got lists of names, which obviously are in a few places. We collect Afghanistan wounded statistics which are terribly confused in the UK, because what they do is they try and make them as complicated as possible. So, the most serious ones, NOTICAS is where your next of kin is notified. That’s a serious event, but also you’ve got all those people evacuated. So anyway, this kind of data. We also keep amputation data, which is a new set that the government refused to release until recently, and a Guardian reporter was instrumental in getting this data released. So we kind thought, maybe we should make this available for people.
So you get all this data, and then what you can do, if you click on “File” there, you can download it as Excel, XML, CSV, or whatever format you want. So that’s why we use Google speadsheets. It’s the kind of thing that’s a very, very easily accessible format for people.
So really what we do is we try and encourage a community, a community to grow up around data and information. So every post has got a talk facility on it.
Anyway, going through it. So this is today’s Data Blog, where you’ve got Afghanistan, Academy schools in the UK. The schools are run by the state, pretty much.
JS: So just to clarify this for the American audience, what’s an Academy school?
SR: Ok, well basically in the UK most schools are state schools, that most children go to. State schools are, we all pay for them, they’re paid for out of our taxes. And they’re run at a local level, which obviously has it’s advantages because it means that you are, kind of, working to an area. What the new government’s proposing to do is allow any school that wants to to become an Academy. And what an Academy is is a school that can run its own finances, and own affairs.
And what we’ve got is we’ve got the data, the government’s published the data — as a PDF of course because governments always publish everything as a PDF, in this country anyway — and what they give you, which we’ve scraped here, is a list of every school in the UK which has expressed an interest. So you’ve got the local authority here, the name of the school, type of school, the address, and the post code. Which is great, because that’s good data, and because it’s on a PDF we can get that into a spreadsheet quite easily.
JS: So did you have to type in all of those things from a PDF, or cut and paste them?
SR: Good god no. No, no, we have, luckily we’ve got a really good editorial support team here, who are, thanks to the Data Blog, are becoming very experienced at getting data off of PDFs. Because every government department would much rather publish something as a PDF, so they can act as if they’re publishing the data but really it’s not open.
JS: So that’s interesting, because in the UK and the US there’s this big government publicity about, you know, we’re publishing all this data.
SR: Absolutely.
JS: But you’re saying that actually–
SR: It’s not 100 percent yet. So, I’ll show you in a second that what they tend to do is just publish– most government departments still want to publish stuff as PDFs. They can’t quite get out of that thing. Or want to say, why would somebody want a spreadsheet? They don’t really get it. A lot of people don’t get it.
And, we wanted the spreadsheet so you can do stuff like this, which is, this is a map of schools interested in becoming Academies by area. And so because we have that raw data in spreadsheet form we can work out how many in the area. You can see suddenly that this part of England, Kent, has 99 schools, which is the biggest in the country. And only one area, which is Barking, up here, in London, which is, sorry, is down here in London, but anyway that has no schools applying at all.
And the government also always said that at the beginning that it would mainly be schools which weren’t “outstanding” would apply. But actually if you look at the figures, which again, we can do, the majority of them are outstanding schools. So they’re already schools which are good, which are applying to become academies. Which kind of isn’t the point. But that kind of analysis, that’s data journalism in a sense. It’s using the numbers to get a story, and to tell a story.
JS: And how long did that story take you to put together? To get the numbers, and do the graphics, and…?
SR: Well, I was helped a bit, because I got, I’ve had one of my helpers who works in editorial support to get the data onto a spreadsheet. And in terms of creating the graphic we have a fantastic tool here, which is set up by one of our technical development team who are over there, and what it does, is it allows you to paste a load of data, geographic data, into this box, and you tell it what kind, is it parliamentary constituency, or local authority, or educational authority, or whatever, however the different regional differentiations we have in the UK, and it will draw a map for you. So this map here was drawn by computer, basically, and then one of the graphics guys help sort out the labels and finesse it and make it look beautiful. But it saves you the hard work of coloring up all those things. So actually that took me maybe a couple of hours. In total.
JS: How about getting the data, how long did that take?
SR: Oh well luckily that data– you know the government makes the data available. But like I say, as a PDF file. So this is the government site, and that’s the list there, and you open it, it opens as a PDF. Because we’ll link to that.
But luckily the guys in the ESD [editorial services department] are very adept now, because of the Data Blog, at getting data into spreadsheets. So, you know they can do that in 20 minutes.
JS: So how many people are working on data overall, then?
SR: Well, in terms of– it’s my full time job to do it. I’m lucky in that I’ve got an awful lot of people around here who have got an interest who I can kind of go and nudge, and ask. It’s a very informal basis, and we’re looking to formalize that, at the moment. We’re working on a whole data strategy, and where it goes. So we’re hoping to kind of make all of these arrangements a bit more formal. But at the moment I have to fit into what other people are doing. But yeah, we’ve got a good team now that can help, and that’s really a unique thing.
So I was going through the Data Blog for you. So this is a typical, a weird day, so schools, and then we’ve got another schools thing because it’s a big schools day today. This is school building projects scrapped by constituency, full list. Now, this is another where the government didn’t make the data easily available. The department for education published a list of all the school projects that were going to be stopped when the government cut the funding, some of which is going towards creating Academy schools, which is why this is a bit of an issue in the country at the moment. And we want to know by constituency how it was working. So which MPs were having the most school projects cut, in their constituency. And we couldn’t get that list out of the department of education, but one MP had lodged it with the House of Commons library. So we managed to get it from the House of Commons library. But it didn’t come in a good form, it came in a PDF again, so again we had to get someone from tech to sort it out for us.
But the great thing is that we can do something like this, which is a map of projects stopped by constituency, by MP. And most of the projects we’ve stopped were in Labour seats. As you know Labour are not in power at the moment. So we can do some of this sort of analysis which is great. So there were 418 projects stopped in Labour constituent seats, and 268 stopped in conservative seats. So basically 40% of Labour MPs had a project stopped, at least one project stopped in their seat, compared to only 27% of Conservatives, and 24% of the Dems who are in power at the moment.
JS: So would it be accurate to say the data drove this story, or showed this story, or…?
SR: Data showed this story, which is great, but the one thing, the caveat — of course, the raw numbers are never 100% — the caveat was there were more projects going on in Labour areas because Labour government, previous government which is Labour set up the projects, and they gave more projects to Labour areas. So you can read it either way.
JS: And you said this in the story?
SR: We said this in the story. Absolutely. We always try and make the caveats available for people. So that’s a big story today, because of there are demonstrations about it in London. You’ve come to us on a very education-centered day today.
But there’s other stuff on the blog too. This is a very British thing. We did this because we thought it would be an interesting project to do. I had somebody in for a week and they didn’t have much to do so I got them to make a list of every Doctor Who villain ever.
JS: This was an intern project?
SR: This was an intern project. We kinda thought, yeah, we’ll get a bit of traffic. And we’ve never had so much involvement in a single piece ever. It’s had 500 retweets, and when you think most pieces will get 30 or 40, it’s kind of interesting. The traffic has been through the roof. And the great thing is, so we created–
JS: Ooh, what’s this? This is good.
SR: It’s quite an easy– we use ManyEyes quite a lot, which is very very quick to create lovely little graphics. And this is every single Doctor Who villain since the start of the program, and how many times they appear. So you see the Daleks lead the way in Doctor Who.
JS: Yeah, absolutely.
SR: Followed by the Cybermen, and the Masters in there a lot. And there are lots of other little things. But we started off with about 106 villains in total, and now we’re up to– we put it out there and we said to people, we know this isn’t going to be the complete list, can you help us? And now we’ve got 212. So my weekend has basically been– I’ll show you the data sheet, it’s amazing. You can see the comments are incredible. You see these kinds of things, “so what about the Sea Devils? The Zygons?” and so on.
And I’ll show you the data set, because it’s quite interesting. So this is the data set. Again Google docs. And you can see over here on the right hand side, this is how many people looking at it at any one time. So at that moment there are 11 people looking on. There could be 40 or 50 people looking at any one moment. And they’re looking and they’re helping us make corrections.
JS: So, wait– this data set is editable?
SR: No, we haven’t made it editable, because we’ve had a bad experience people coming to editable ones and mucking around, you know, putting swear words on stuff.
JS: So how do they help you?
SR: Well they’ll put stuff in the comments field and I’ll go in and put it on the spreadsheet. Because I want a sheet that people can still download. So now we’ve got, we’re now up to 203. We’ve doubled the amount of villains thanks to our readers. It’s Doctor Who. And it just shows we’re an eclectic– we’re a broad church on the Data Blog. Everything can be data. And that’s data. We’ve got number of appearances per villain, and it’s a program that people really care about. And it’s about as British as it’s possible to get. But then we also have other stuff too– and there we go, crashed again.
JS: Well let me just ask you a few questions, and take this opportunity to ask you some broader questions. Because we can do this all day. And I have. I’ve spent hours on your data blog because I’m a data geek. But let’s sort of bring it to some general questions here.
SR: Okay. Go for it.
JS: So first of all, I notice you have the Data Blog, you also have the world data index.
SR: Yes. Now the idea of that was that, obviously lots of governments around the world have started to open up their data. And around the time that the British government was– a lot of developers here were involved in that project — we started to think, what can we do around this that would help people, because suddenly we’ve got lots of sites out there that are offering open government data. And we thought, what if we could just gather them all together into one place. So you’ve got a single search engine. And that’s how we set up the world data search. Sorry to point you at the screen again.
JS: No that’s fine, that’s fine.
SR: Basically, so what we did, we started off with just Australia, New Zealand, UK and America. And basically what this site does, is it searches all of these open government data sites. Now we’ve got Australia, Toronto in Canada, New Zealand, the UK, London, California, San Francisco, and data.gov.
So say you search for “crime,” say you’re interested in crime. There you go. So you come back here, you see you’ve got results here from the UK, London, you’ve got results from data.gov in America, San Francisco, New Zealand and Australia. Say you’re interested in just seeing– you live in San Francisco and you’re only interested in San Francisco results. You’ve three results. And there you go, you click on that.
And you’re still within the Guardian site because what we’re asking people to do is help us rank the data, and submit visualizations and applications. So we want people to tell us what they’ve done with the data.
But anyway if you go and click on that, and you click on “download,” and it will start downloading the data for you. Or, what it will do is take you to the terms and conditions. We don’t bypass any T&Cs. The T&C’s come alongside. But you click on that, you agree to that, and then you get the data. So we really try and make it easy for people. There you go. And this is the crime incidence data. Very variable. This is great because it’s KML files, so if you wanted to visualize that you get really great information. It’s all sorts of stuff. Sometimes it’s CSVs.
JS: What’s a KML file?
SR: So, Google Earth.
JS: Okay.
SR: Sorry. So, it’s mapping, a mapping file straight away.
SR: Okay, so one of the things we ask people to do is to submit visualizations and applications they’ve produced. So for instance, London has some very very good open data. If you haven’t looked around the Data Store, it’s really worth going to. And one of these things they do is they provide a live feed of all the London traffic cameras. You can watch them live. And this is a lovely thing, because what somebody’s done is they’ve written an iPad application. So you can watch live TFL, Transport for London, traffic cameras on your iPad.
And you see that data set has been rated. A couple of people have gone in there and rated it. You’ve got a download button, the download is XML. So we try and help people around this data. And this is growing now. Every time somebody launches an open government data site we’re gonna put it on here, and we’re working on a few more at the moment. So we want it to be the place that people go to. Every time you Google “world government data” it pops up at the top, which is what you want. You want people who are just trying to compare different countries and don’t know where to start, to help them find a way through this maze of information that’s out there.
JS: So do you intend to do this for every country in the world?
SR: Every country in the world that launches an open government data site, we’ll whack it on here. And we’re working– at the moment there are about 20 decent open government data sites around the world. We’re picking those up. We’ve got on here now, how many have we got? One, two, three, four, five, six, seven, eight. We’ll have 20 on in the next couple of weeks. We’re really working through them at the moment.
And what this does is, it scrapes them. So basically, we don’t– for us it’s easy to manage because we don’t have to update these data sets all the time. The computer does that for us. But basically, what we do provide people with is context and background information, because you’re part of the data site there.
JS: So let me make sure I have this clear. So you’re not sucking down the actual data, you’re sucking down the list and descriptions of the data sets available?
SR: Absolutely. So we’re providing people, because basically we want it to be as updated as possible. We don’t– if we just uploaded onto our site, that would kind of be pointless, and it would mean it would be out of date. This way, if something pops up on data.gov and stays there, we’ll get it quick on here. We’ll help people find it. Helping people find the data, that’s our mission here. It’s not just generating traffic, it’s to help people find the information, because we want people to come to us when they’re looking for data.
JS: So, okay. You’ve talked about, it sounds like, two different projects. The Data Blog. where you collect and clean up and present data that you–
SR: That we find interesting. We’re selective.
JS: In the process of the Guardian’s newsgathering.
SR: Yeah, and just things that are interesting anyway. So the Doctor Who post that we were just looking at is just interesting to do. It’s not anything we’re going to do a story about. And often they’ll be things that are in the news, say that day, and I’ll think “oh that’s a good thing to put on the Data Blog.” So it could be crime figures, or it could be– and sometimes, the side effect of that is a great side effect because you end up with a piece in the paper, or a piece on the web site. But often it might be the Data Blog is the only place to get that information.
JS: And you index world government data sites.
SR: Yeah, absolutely.
JS: Does the Guardian do anything else with data?
SR: Yeah, well what we do is, we’re doing a lot of Guardian research with data. So what we want to do is give people a kind of way into that. So for instance, we do do a lot of data-based projects. So for instance we’re doing an executive pay survey of all the biggest companies, how much they pay their bosses and their chief executives. That has always been a thing the paper’s always done for stories. And now what we’ll do is we’ll make that stuff available– that data available for people. So instead of just raw data journalism, it’s quite old data journalism. We’ve been doing it for ten years. But we used to just call it a survey. Now it’s data journalism, because it’s getting stories out of numbers. So we’ll work with that, and we’ll publish that information for people to see. And there are a couple of big projects coming up this week, which I really can’t tell you about, but next week it will be obvious what they are.
JS: Probably by the time this goes up we’ll be able to link to them.
[Simon was referring to the Guardian's data journalism work on the leaked Afghanistan war logs, described in a thorough post on the Data Blog.]
SR: Yeah, I’ll mail you about them. But we’ve got now an area of expertise. So increasingly what I’m finding is that I’m getting people coming to me within The Guardian, saying, so we’ve got this spreadsheet, well how can I do this? So for instance that Academies thing we were just looking at, we were really keen to find out which areas were the most, where the most schools were, for the paper. The correspondent wanted to know that. So actually, because we’ve got this area of expertise now in managing data, we’re becoming kind of a go-to place within The Guardian, for journalists who are just writing stories where they need to know something, or they need to find some information out, which is an interesting side effect. Because it used to be that journalists were kind of scared of numbers, and scared of data. I really think that was the case. And now, increasingly, they’re trying to embrace that, and starting to realize you can get stories out of it.
JS: Well that’s really interesting. Let’s talk for a minute about how this applies to other newsrooms, because it’s– as you say, journalists have been traditionally scared of data.
SR: Yeah, absolutely. You could say they prided themselves, in this country anyway, they prided themselves on lack of mathematical ability. I would say.
JS: Which seems unfortunate in this era.
SR: Yeah, absolutely. Yeah, yeah, absolutely.
JS: But especially a lot of our readers are from smaller newsrooms, and so what kind of technical capability do you need to start tracking data, and publishing data sets?
SR: I think it’s really minimal. I mean, the thing is that actually, what we’re doing is really working with a basic, most of the time just basic spreadsheet packages. Excel or whatever you’ve got. Excel is easy to use, but it could be any package really. And we’re using Google spreadsheets, which again is widely available for people to do information. We’re using visualization tools which are again, ManyEyes or Timetric which are widely available and easy to use. I think what we’re doing is just bringing it together.
I think traditionally that journalists wouldn’t regard data journalism as journalism. It was research. Or, you know, how is publishing data– is that journalism? But I think now, what is happening is that actually, what used to happen is that we were the kind of gatekeepers to this information. We would keep it to ourselves. So we didn’t want our rivals to get ahold of it, and give them stories. We’d be giving stories away. And we wouldn’t believe that people out there in the world would have any contribution to make towards that. Now, that’s all changed now. I think now we’ve realized that actually, we’re not always the experts. Be it Doctor Who or Academy schools, there’s somebody out there who knows a lot more than you do, and can thus contribute. So you can get stories back from them, in a way. So we’re receiving the information much more.
JS: So you publish the data, and then other people build stories out of it, is that what you’re saying?
SR: Other people will let us know– well, we publish say, well that’s an interesting story, or this is a good visualization. We’ve published data for other people to visualize. We thought, that’s quite an interesting thing to mash it up with, we should do that ourselves. So there’s that thing, and there’s also the fact that if you put the information out there, you always get a return. You get people coming back.
So for instance the Academies thing today that we were talking about. We’ve had people come back saying, well I live in Derbyshire and I know that those schools are in quite wealthy areas. So we start to think, well is there a trend towards schools in wealthy areas going to this, and schools in poorer areas not going to this.
So it gives you extra stories or extra angles on stories you wouldn’t think of. And I think that’s part of it. And I think partly there’s just the realization that just publishing data in itself, because it’s interesting, is a journalistic enterprise. Because I think you have to apply journalistic treatment to that data. You have to choose the data in a selective, editorial fashion. And I think you have to process it in a way that makes it easy for people to use, and useful to people.
JS: So last question here, which is of course going to be on many editors’ and publishers’ minds.
SR: Sure.
JS: Let’s talk about traffic and money. How does this contribute to the business of The Guardian?
SR: Okay, it’s a new– it’s an experiment for us, but traffic-wise it’s been pretty healthy. We’ve had– during the election we were getting a million page impressions in a month. Which is not bad. On the Data Blog. Now, as a whole, out of the 36 million that The Guardian gets, it doesn’t seem like a lot. But actually, in the firmament of Guardian web sites that’s not bad. That’s kind of upper tier. And this is only after being around for a year.
So in terms of what it gives us, it gives the same as producing anything that produces traffic gives us. It’s good for the brand, and it’s good for The Guardian site. In the long run, I think that there is probably canny money to be made out of there, for organizations that can manage and interpret data. I don’t know exactly how, but I think we’d have to be pretty dumb if we don’t come up with something. I’d be very surprised. It’s an area where there’s such a lot of potential. There are people who don’t really know how to manage data and don’t really know how to organize data that– for us to get involved in that area. I really think that.
But also I think that just journalistically, it’s as important to do this as it is to write a piece about a fashion week or anything else we might employ a journalist to do. And in a way it’s more important, because if The Guardian is about open information, which– since the beginning of The Guardian we’ve campaigned for freedom of information and access to information, and this is the ultimate expression of that.
And we, on the site, we use the phrase “facts are sacred.” And this comes from the famous C. P. Scott who said that “comment is free,” which as you know is the name of our comment site, but “facts are sacred” was the second part of the saying. And I kinda think that is– you can see it on the comment site, there you go. “Comment is free, but facts are sacred.” And that’s what The Guardian’s about. I really think that, you know, this says a lot about the web. Interestingly, I think that’s how the web is changing, in the sense that a few years ago it was just about comment. People wanted to say what they thought. Now I think it’s, increasingly, people want to find out what the facts are.
JS: All right, well, thank you very much for a thorough introduction to The Guardian’s data work.
SR: Thanks a lot.
"Tell the chef, the beer is on me."
"Basically the price of a night on the town!"
"I'd love to help kickstart continued development! And 0 EUR/month really does make fiscal sense too... maybe I'll even get a shirt?" (there will be limited edition shirts for two and other goodies for each supporter as soon as we sold the 200)