Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 24 2013

15:44

Opening up government: A new round of Knight News Challenge winners aims to get citizens and governments better information

Moments ago, the Knight Foundation announced its latest round of winners in the Knight News Challenge, its currently semiannual competition to identify fundable ideas that advance the interests of journalism and the information needs of communities. This round focused on the open government movement, and its eight winners all fit squarely into that box. More about them below.

But the big news is what Knight Foundation CEO Alberto Ibargüen just said here in Cambridge at the opening morning of the 2013 MIT-Knight Civic Media Conference. He asked openly for ideas on what the future of the News Challenge should be, because, as he put it, “It may be finished. It may be that, as a device for doing something, it may be that we’ve gone as far as we can take it.”

#civicmedia @ibarguen @knightfdn asks for ideas on how to take #newschallenge to the next level. He asks is #newschallenge dead? Send ideas

— Damian Thorman (@dthorman) June 24, 2013

The six-year-old News Challenge is probably the highest-profile effort to fund innovation in journalism and media. It has funded many dozens of projects over the years, and beyond that, its application process has forced thousands of people to turn fuzzy ideas into concrete proposals. Knight devotes $5 million a year to the News Challenge, which has evolved from a single annual open call to a series of smaller, faster, more focused contests, which a significant reboot leading into 2012.

With more than a half decade in the rearview, Ibargüen asked what had been accomplished: “What have we actually achieved? How have we changed the way people receive their information? How have we affected the existing news community?….They take, I think, comparatively little notice of the things people in this room do.”

To be clear, he gave no sign of stepping away from funding journalism innovation, which remains a core Knight mission. But he noted that the foundation had maximum flexibility in how to accomplish that goal: “We have a huge luxury: We can do whatever we want to do. We can use whatever process we want to use.”

Which was behind his question to the assembled crowd: “What would you do if you had decided to invest $5 million a year in figuring out how to best get news and information to communities? What would you do?”

There will be at least one more round of the News Challenge later this year (topic TBA), but beyond that, Knight’s thinking about where to take the broader idea. Ibargüen said he expected the foundation would make these decisions over the next four to five months. If you’ve got an idea, get in touch with Knight.

But that’s the future. How about the brand new round of winners? Civic Insight promises to create better databases of vacant properties so activists can better connect land to opportunities. OpenCounter wants to make it easier for small businesses to navigate local regulation. Outline.com aims to build public policy simulators, estimating the impact of legislative decisions on people’s circumstances. The Oyez Project will offer clear case summaries of the suits before American appellate courts. GitMachines wants to make it easier for governments to add servers quickly.

As I wrote in January for the last round of announcements, the “News” in Knight News Challenge seems to be moving out of the spotlight in favor of a broader concept of connecting civic information to people who can use it. In the classical American 20th century news model, that was a role that typically involved journalists as intermediaries. Today, though, those communities of self-interest can organize in ways more efficient than a newspaper’s subscriber list. While a few of the projects funded could be of use to journalists — making data available to the general public also makes it available to reporters, who can then approach it with a different set of interests — they’re not the primary target. (That growing disconnect, I imagine, is something that will be addressed in whatever new form the News Challenge takes.)

Civic Insight

Award: $220,000
Organization: Civic Industries
Project leads: Alex Pandel, Eddie Tejeda and Amir Reavis-Bey
Twitter: @CivicInsight, @alexpandel, @maromba, @eddietejeda

Neighbors, cities, nonprofits and businesses all have an interest in seeing vacant properties become productive again. However, a lack of public access to information about these properties makes it difficult for groups to work together on solutions. By plugging directly into government databases, Civic Insight provides real-time information on vacant and underutilized properties, enabling more collaborative, data-driven community development. With Civic Insight, journalists and residents can search for a property on a map and learn about its ownership, inspection and permitting history, and subscribe to receive real-time notifications about changes. Civic Insight grew out of a successful pilot in New Orleans called BlightStatus, which was created during the team’s 2012 Code for America fellowship. It is now available for licensed use by cities nationwide. Knight Foundation’s support will help the team expand the software and test new use cases in more communities.

Team: Eddie Tejeda is a web developer and former Code for America fellow who brings 10 years of experience working on open-source civic projects such as Digress.it and Littlesis.org. Tejeda is engaged in the Open Gov movement in his home city of Oakland, where he co-founded OpenOakland and serves as a mayoral appointee to the city’s Public Ethics Commission, which oversees government transparency.

Alex Pandel is a designer, communicator and community organizer. Before her 2012 Code for America fellowship with the City of New Orleans, Pandel was engaged in public-interest advocacy work with CalPIRG, as well as designing print and web solutions for organizations like New York Magazine and The Future Project.

Amir Reavis-Bey is a software engineer with experience building client-server applications for investment bank equities trading. He also has web development experience helping non-profits to collaborate and share resources online to promote human rights activism. He spent 2012 partnering with the City of New Orleans as a Code for America fellow.

GitMachines

Award: $500,000
Organization: GitMachines
Project leads: Greg Elin, Rodney Cobb, Ikjae Park, Terence Rose, Blaine Whited and John Lancaster
Twitter: @gregelin

Governments are often reluctant to adopt new software and technology because of security and compliance concerns. GitMachines allows developers doing civic innovation to easily build new technology governments can use faster, by offering a grab-and-go depot of accreditation-ready servers that support their projects. Unlike traditional servers that can take hours or days to set-up, GitMachines can be up and running in minutes and are pre-configured to meet government guidelines. This makes it easier for governments to adopt open source software, and will help government agencies adopt new technology more quickly in the future.

Team: Rodney Cobb is a mobile developer and data analyst working in Washington D.C. Through his previous work with Campus Compact, Cobb has worked on several projects combing civic engagement/service learning and virtual interaction. Cobb received a bachelor’s in political science from Clark-Atlanta University and his master’s in politics from New York University.

Greg Elin has spent 20 years developing easy-to-use information tools and helping organizations embrace disruptive technologies. In 2006, Elin created the Sunlight Foundation’s Sunlight Labs. Previously, he was chief technology officer at United Cerebral Palsy before entering the civil service in 2010 as one of the first chief data officers in federal government. Elin has been leading the Federal Communications Commissions’ efforts to lower data collection burden and improve data sharing with modern web service APIs. He was a member of the White House Task Force on Smart Disclosure exploring machine-readable data as a policy tool and citizen aid. Elin has a master’s in interactive telecommunications from New York University’s Tisch School of Art.

John Lancaster has bachelor’s degree in computer science, a minor in studio art and is studying for his master’s of information systems technology. He has worked as a technology consultant the past four years at the Department of State where he builds mission critical websites that reach a global audience in over 60 languages, and manages the server infrastructure that supports the entire operation.

Ikjae Park is an expert in software development and system administration working for a government contractor and has developed enterprise JAVA applications at Salesforce.com, among others. He is passionate about development and making a simple workflow process for the community.

Terence Rose is a senor business Analyst with MIL Corp., currently leading the content development and user experience for high profile Department of Commerce projects. He previously worked as a technologist on contract for the Office of Head Start.

Blaine Whited is a programmer and systems administrator with a bachelor’s in computer science.

OpenCounter

Award: $450,000
Organization: OpenCounter
Project leads: Peter Koht, Joel Mahoney
Twitter: @opencounter, @yurialeks, @joelmahoney

While entrepreneurs may have market-moving ideas, very few can expertly navigate the local government permitting process that allows them to open and operate. Whether it’s a startup, boutique or restaurant, OpenCounter helps to simplify this interaction with city government. It collects and sorts data on existing regulations while providing running totals of the costs and time involved in setting up shop. A team of Code for America fellows developed and piloted OpenCounter in Santa Cruz, Calif. during 2012. Knight Foundation funds will support OpenCounter’s expansion to new communities, including several 2013 Code for America cities.

Team: Peter Koht, a self-described civics nerd, is an experienced economic development professional who most recently worked for the City of Santa Cruz. Koht worked on a number of issues at the city, including leading a regional broadband policy group, opening up city data and spearheading policy initiatives that lowered administrative barriers to job creation. Previous to his public sector role, he worked in technology and media.

Joel Mahoney is a civic technologist and serial entrepreneur. He was an inaugural fellow at Code for America, and served as a technical advisor to the organization. Before Code for America, Mahoney founded several startups, including an online travel site, a genetics visualization tool and an m-health platform for diabetics. His work has been featured in The Washington Post, The Boston Globe and The New York Times.

Open Gov for the Rest of Us

Award: $350,000
Organization: LISC Chicago
Project leads: Susana Vasquez, Dionne Baux, Demond Drummer, Elizabeth Rosas-Landa
Twitter: @liscchicago

Open Gov for the Rest of Us is seeking to engage neighborhoods on Chicago’s South Side in the Open Government movement. The three-stage campaign will connect more residents to the Internet, promote the use of open government tools and develop neighborhood-driven requests for new data that address residents’ needs. Building on the success of LISC Chicago’s Smart Communities program and Data Friday series, the project aims to spread a culture of data and improved use of digital tools in low-income neighborhoods by directly involving their residents.

Team: Susana Vasquez is LISC Chicago’s executive director. Vasquez joined LISC in 2003 as a program officer and soon became director of the office’s most ambitious effort – the New Communities Program, a 10-year initiative to support comprehensive community development in 16 neighborhoods. She has a bachelor’s degree in history from the University of Illinois and a master’s from Harvard University’s Kennedy School of Government.

Dionne Baux, a LISC Chicago program officer who works on economic development and technology programs, has worked in city government and for nonprofits for more than seven years. Baux leads LISC’s Smart Communities program, which is designed to increase digital access and use by youth, families, businesses and other institutions. She has a master’s degree in public administration, with a focus in government, from Roosevelt University.

Demond Drummer is tech organizer for Teamwork Englewood, an organization formed in 2003 as part of LISC Chicago’s New Communities Program. Its goal is to strengthen the Englewood neighborhood on Chicago’s South Side. Drummer joined Webitects, a web design firm, in summer 2009. Previously, he coordinated a youth leadership and civic engagement initiative in Chicago. A graduate of Morehouse College, he is completing a master’s degree at the University of Chicago Divinity School.

Elizabeth Rosas-Landa is the Smart Communities program manager at The Resurrection Project in Chicago’s Pilsen neighborhood. A Mexico City native, she received a bachelor’s degree in information technology from Insurgentes University and later joined the Marketing and Promotion Company in Mexico. In 2008, she moved to the United States to work with community organizations on technology issues. At The Resurrection Project, Rosas-Landa has implemented computer literacy programs for residents and businesses.

Outline.com

Award: unspecified, through Knight Enterprise Fund
Organization: Outline.com
Project leads: Nikita Bier, Jeremy Blalock, Erik Hazzard, Ray Kluender
Twitter: @OutlineUSA

Outline.com is developing an online public policy simulator that allows citizens and journalists to visualize the impact that particular policies might have on people and their communities. For instance, with Outline.com, a household can measure how a tax cut or an increase in education spending will affect their income. The project builds on the team’s award-winning app Politify, which simulated the impacts of the Obama and Romney economic plans during the 2012 campaign. The Outline.com simulator uses models developed by a team of economists, backed by open data on American households from the IRS, the Census Bureau and other sources. The Commonwealth of Massachusetts has hired Outline.com to develop an official pilot. The team is a part of the accelerator TechStars Boston.

Team: Nikita Bier, CEO, recently graduated from the University of California at Berkeley with honors and degrees in business administration and political economy. During his college years, he researched higher education finance, receiving recognition for his insights from the president of the university. While a student, he founded Politify.us, an award-winning election application that received national coverage. Before that, he worked in business development at 1000memories, a Greylock and YCombinator-backed startup.

Jeremy Blalock, CPO, led design and development for Politify.us. He is currently on leave from UC Berkeley, where he studied electrical engineering and computer science.

Erik Hazzard, CTO, is an active member of the data visualization and mapping communities. He was formerly lead developer at Visual.ly. He is the author of OpenLayers 2.10 Beginner’s Guide. He graduated from Florida State University with a bachelor’s degree in information science.

Ray Kluender graduated with honors from the University of Wisconsin with majors in economics, mathematics and political science. His extensive research experience includes involvement in developing value-added models of teacher effectiveness for Atlanta, New York City and Los Angeles public schools, election forecasting for Pollster.com and studying optimal health insurance design and government intervention in health care at the National Bureau of Economic Research. He will be starting his Ph.D. in economics at MIT this August.

Note: Outline.com is receiving funds through the Knight Enterprise Fund, an early stage venture fund that invests in for-profit ventures aligned with Knight’s mission of fostering informed and engaged communities. In line with standard venture-capital practices, the funding amounts are not being disclosed.

Oyez Project

Award: $600,000
Organization: University of Chicago, Kent School of Law
Project lead: Jerry Goldman
Twitter: @oyez

The activities of courts across the country are often hard to access and understand. For the past 20 years, the Oyez Project has worked to open the U.S. Supreme Court by offering clear case summaries, opinions and free access to audio recordings and transcripts. With Knight Foundation funding, Oyez will expand to state supreme and federal appellate courts, offering information to the public about the work of these vital but largely anonymous institutions. Beginning in the five largest states that serve over one-third of the American public, Oyez will work with courts to catalog materials and reformat them following open standards practices. In conjunction with local partners, Oyez will annotate the materials, adding data and concise summaries that make the content more accessible for a non-legal audience. Oyez will release this information under a Creative Commons license and make it available online and through a mobile application.

Team: Professor Jerry Goldman of the IIT Chicago-Kent College of Law has brought the U.S. Supreme Court closer to everyone through the Oyez Project. He has collaborated with experts in linguistics, psychology, computer science and political science with major financial support from the National Science Foundation, the National Endowment for the Humanities, Google and a select group of national law firms to create an archive of 58 years of Supreme Court audio. In recent years, Oyez has put the Supreme Court in your pocket with mobile apps, iSCOTUSnow and PocketJustice.

Plan in a Box

Award: $620,000
Organization: OpenPlans
Project lead: Frank Hebbert, Ellen McDermott, Aaron Ogle, Andy Cochran, Mjumbe Poe
Twitter: @OpenPlans

Local planning decisions can shape everything about a community — from how residents get around, to how they interact with their neighbors and experience daily life. Yet information on projects — from new plans for downtown centers to bridge replacements — is often difficult to obtain. This project will be an open-source web-publishing tool that makes it easy to engage people in the planning process. With minimal effort, city employees will be able to create and maintain a useful website that provides information that citizens and journalists need while integrating with social media and allowing for public input.

Team: Aaron Ogle is an OpenPlans software developer. Prior to OpenPlans, he was a fellow at Code for America where he partnered with the City of Philadelphia to build solutions to help foster civic engagement. He specializes in JavaScript and GIS development and has contributed to such applications as reroute.it, septa.mobi, changeby.us, walkshed.org and phillystormwater.org.

Andy Cochran, creative director, provides design vision for OpenPlans’ projects, building user interfaces for tools that enable people to be better informed and stay engaged in local issues. Cochran has a bachelor’s degree from the Maryland Institute College of Art, and he has over a decade of experience in print and web design.

Ellen McDermott leads OpenPlans’ outreach to community organizations and cities, to help them be effective in using digital and in-person engagement tools. She also manages operations for OpenPlans. Previously, McDermott was the director of finance and administration for Honeybee Robotics, a technology supplier to the NASA Mars programs. She is a graduate of Amherst College and King’s College London.

Frank Hebbert leads the software team at OpenPlans. Outside of work, he volunteers with Planning Corps, a network of planners providing assistance to non-profit and community groups. Hebbert holds a master’s degree in city planning from MIT.

Mjumbe Poe is a software developer for OpenPlans. Previously, Poe was a fellow at Code for America, and before that a research programmer at the University of Pennsylvania working on modeling and simulation tools for the social sciences.

Procure.io

Award: $460,000
Organization: Department of Better Technology
Project leads: Clay Johnson and Adam Becker
Twitter: @cjoh @AdamJacobBecker

The government procurement process can be both highly complicated and time-consuming — making it difficult for small businesses to discover and bid on contracts and for journalists and transparency advocates to see where public money is going. As White House Presidential Innovation Fellows, Clay Johnson and Adam Becker built a simple tool for governments to easily post requests for proposals, or RFPs. Based on its early success at the federal level, the team is planning to expand the software to help states and cities. In addition, they will build a library of statements of work that any agency can adapt to their needs. The goal is to bring more competition into government bidding, as a way to both reduce costs and ensure that the most qualified team gets the job.

Team: Clay Johnson may be best known as the author of The Information Diet: A Case for Conscious Consumption. Johnson was also one of the founders of Blue State Digital, the firm that built and managed Barack Obama’s online campaign for the presidency in 2008. Since 2008, Johnson has worked on opening government, as the director of Sunlight Labs until 2010, and as a director of Expert Labs until 2012. He was named the Google/O’Reilly Open Source Organizer of the Year in 2009, was one of Federal Computing Week’s Fed 100 in 2010, and won the CampaignTech Innovator award in 2011. In 2012, he was appointed as an inaugural Presidential Innovation Fellow and led the RFP-EZ project, a federal experiment in procurement innovation.

Adam Becker is a software developer and entrepreneur. He co-founded and served as chief technology officer of GovHub, a civic-oriented startup that was the first to provide users a comprehensive, geographically calculated list of their government officials. In 2012, he was appointed alongside Johnson as an inaugural Presidential Innovation Fellow and led the development of RFP-EZ.

May 25 2013

19:19

Invitacion a ser parte de RedLATIC - Red de NetSquared a nivel de Latino America

Por iniciativa de algunas instituciones miembros de NetSquared ( http://www.netsquared.org) y con el apoyo de TechSoup Global (http://www.techsoupglobal.org/) estamos armando una red de organizaciones e individuos que tiene por objetivo el uso de la tecnologia para bien social ("technology for social good") en la región de Latino América llamada RedlaTic

read more

July 30 2012

18:37

Netsquared Regional Event for Cameroon and Nigeria

The Netsquared Regional Conference for Cameroon and Nigeria is a multi stake holder event that will bring together actors from local Netsquared groups, Internet Society, civil society, diplomatic institutions, government and the tech world to articulate on issues related to the social web and nongovernmental diplomacy. Citizens from three neighboring countries including: Cameroon, Nigeria and Central African Republic, in a two day event will seek to resolve the following challenges:

- The difficulties faced in introducing the social web for social development in the sub region

read more

April 30 2012

14:00

How to Contribute to OpenStreetMap and Grow the Open Geodata Set

Hundreds of delegates from government, civil society, and business gathered in Brasilia recently for the first Open Government Partnership meetings since the inception of this initiative. Transparency, accountability, and open data as fundamental building blocks of a new, open form of government were the main issues debated. With the advent of these meetings, we took the opportunity to expand an open data set by adding street names to OpenStreetMap.

Getting ready to survey the Cruzeiro neighborhood in Brasilia.

OpenStreetMap, sometimes dubbed the "Wikipedia of maps," is an open geospatial database. Anyone can go to openstreetmap.org, create an account, and add to the world map. The accessibility of this form of contribution, paired with the openness of its common data repository, holds a powerful promise of commoditized geographic data.

As this data repository evolves, along with corresponding tools, many more people gain access to geospatial analysis and publishing -- which previously was limited to a select few.

When Steve Coast founded OpenStreetMap in 2004, the proposition to go out and crowdsource a map of the world must have sounded ludicrous to most. After pivotal growth in 2008 and the widely publicized rallying around mapping Haiti in 2010, the OpenStreetMap community has proven how incredibly powerful a free-floating network of contributors can be. There are more than 500,000 OpenStreetMap contributors today. About 3 percent (that's still a whopping 15,000 people) contribute a majority of the data, with roughly 1,300 contributors joining each week. Around the time when Foursquare switched to OpenStreetMap and Apple began using OpenStreetMap data in iPhoto, new contributors jumped to about 2,300 per month.

As the OpenGovernment Partnership meetings took place, we wanted to show people how easy it is to contribute to OpenStreetMap. So two days before the meetings kicked off, we invited attendees to join us for a mapping party, where we walked and drove around neighborhoods surveying street names and points of interest. This is just one technique for contributing to OpenStreetMap, one that is quite simple and fun.

Here's a rundown of the most common ways people add data to OpenStreetMap.

Getting started

It takes two minutes to get started with contributing to OpenStreetMap. First, create a user account on openstreetmap.org. You can then immediately zoom to your neighborhood, hit the edit button, and get to work. We recommend that you also download the JOSM editor, which is needed for more in-depth editing.

Once you start JOSM, you can download an area of OpenStreetMap data, edit it, and then upload it. Whatever you do, it's crucial to add a descriptive commit message when uploading -- this is very helpful for other contributors to out figure the intent and context of an edit. Common first edits are adding street names to unnamed roads, fixing typos, and adding points of interest like a hospital or a gas station. Keep in mind that any information you add to OpenStreetMap must be observed fact or taken from data in the public domain -- so, for instance, copying street names from Google is a big no-no.

Satellite tracing and GPS data

JOSM allows for quick tracing of satellite images. You can simply turn on a satellite layer and start drawing the outlines of features that can be found there such as streets, building foot prints, rivers, and forests. Using satellite imagery is a great way to create coverage fast. We've blogged before about how to do this. Here's a look at our progress tracing Brasilia in preparation for the OGP meetings:

Brasilia progress

OpenStreetMap contributions in Brasilia between April 5 and April 12.

In places where good satellite imagery isn't available, a GPS tracker goes a long way. OpenStreetMap offers a good comparison of GPS units. Whichever device you use, the basics are the same -- you track an area by driving or walking around and later load the data into JOSM, where you can clean it up, classify it, and upload it into OpenStreetMap.

Synchronizing your camera with your tracker

Synchronizing your camera with the GPS unit.

Walking papers

For our survey in Brasilia, we used walking papers, which are simple printouts of OpenStreetMap that let you jot down notes on paper. This is a great tool for on-the-ground surveys to gather street names and points of interest. It's as simple as you'd imagine. You walk or drive around a neighborhood and write up information that you see that's missing in OpenStreetMap. Check out our report of our efforts doing this in Brasilia on our blog.

Walking papers for Brasilia.

Further reading

For more details on how to contribute to OpenStreetMap, check out Learn OSM -- it's a great resource with step-by-step guides for the most common OpenStreetMap tasks. Also feel free to send us questions directly via @mapbox.

July 05 2011

21:47

In an era of technology-fueled transparency: data journalism, and the newsroom stack

O'Reilly radar :: MIT's recent Civic Media Conference and the latest batch of Knight News Challenge winners made one reality crystal clear: as a new era of technology-fueled transparency, innovation and open government dawns, it won't depend on any single CIO or federal program. It will be driven by a distributed community of media, nonprofits, academics and civic advocates focused on better outcomes, more informed communities and the new news, whatever form it is delivered in.

Continue to read Alex Howard, radar.oreilly.com

May 27 2011

17:32

Register Now for OpenGov NYC

OpenGov NYC is an exciting unconference to be held in New York City. This one day event will provide an opportunity for a variety of civicly engaged participants to foster conversations regarding the relation between participation, transparency, and efficiency.

This is third event in a series of annual events hosted by the Open NY Forum. These convening forums are a way of creating a positive and productive space for government workers, technologists, entrepreneurs, and citizens to come together and engage with one another.

The event will be held on Sunday, June 5, 10am to 6pm at CUNY Graduate School of Journalism in New York City. 


The day will revolve around three primary questions:

  • Where is "local" Open Government going?
  • How can we deepen the knowledge of what Open Government can be?
  • What are the social and technical tools affecting Open Government's development?


This is a must attend for anyone passionate about technology, transparency, and government. Tickets are $15 and you can register here. Be sure to follow OpenGov Camp creators Open NY Forum on Twitter.

April 14 2011

07:44

Le Linee Guida per l’Open Data

Open Data, tutti ne parlano, ma come si fa?

A questa domanda abbiamo provato a dare una prima risposta con la stesura di un manualetto: http://tinyurl.com/pendataitalia

L’auspicio è che questo umile lavoro a cui hanno contribuito alcuni dei nostri associati, possa essere un un riferimento per gli amministratori pubblici, i manager e tutti quei decisori che, convinti sulla bontà della filosofia che sorregge la disciplina dell’Open Data Government, non hanno ancora trovato la scatola degli attrezzi per passare dalla teoria alle azioni concrete.

Come tutte le scatole degli attrezzi, anche questa potrà essere riempita di nuovi strumenti e, grazie all’apporto di nuovi contributi, diventare un riferimento per dare finalmente anche all’Italia una strategia per il “governo digitale”.

Cosa vuol dire Open Data? Perché l’Open Data rappresenta una strada verso l’Open Government, e perché l’Open Government è  uno strumento di sviluppo? Quali sono i principali problemi da affrontare quando si vuole “fare” Open Data”? Quali le tematiche giuridiche da tenere in considerazione? Quali gli aspetti tecnici e gli impatti organizzativi? A queste domande (ed a qualcuna in più) abbiamo voluto fornire una prima risposta, per consentire a tutti di iniziare a comprendere i motivi della centralità di questo tema per lo sviluppo del Paese.

Queste linee guida fanno seguito al Manifesto per l’Open Government, che la nostra associazione ha pubblicato a novembre dello scorso anno. Le prossime iniziative che contiamo di portare avanti grazie all’aiuto di un sempre più nutrito gruppi di esperti saranno annunciate nei prossimi giorni, nel corso di alcuni eventi ai quali stiamo lavorando.

Nel contempo, chiunque voglia contribuire a migliorare questa versione può commentare il post. Garantiamo, come sempre, la massima attenzione a tutte le osservazioni, critiche e proposte di miglioria che verranno inserite nella prossima versione.

Ringraziamo la rivista eGov che ha stampato le prime copie del presente manuale per consegnarle a tutti coloro che assisteranno alla premiazione di oggi a Palazzo Marino.

Come Si Fa Open Data – Versione 1.0

 

April 01 2011

16:59

Map Mashup Shows Broadband Speeds for Schools in U.S.

The Department of Education (DOE) recently launched Maps.ed.gov/Broadband an interactive map that shows schools and their proximity to broadband Internet access speeds across the country. This is an important story for DOE, an agency that has a stated goal that all students and teachers have access to a sufficient infrastructure for learning -- which nowadays includes a fast Internet connection. The map is based on open data released last month by the Federal Communications Commission (FCC). As you can see below, the result is a custom map that shows a unique story -- how schools' Internet access compares across the country.

In addition to being an example of an open data mashup, this map also serves as an example of what can be built with emerging open-source mapping tools. We worked with DOE to process and merge the two data sets, and then generated the new map tiles using Mapnik, an open-source toolkit for rendering map tiles. Then we created the custom overlay of schools and universities using TileMill, our open-source map design studio. Finally, a TileMill layer was added on top of the broadband data.

The Feds' Open-Source Leadership

It is great to see both the DOE and FCC able to leverage open data to make smarter policy decisions. Karen Cator, the director of the office of educational technology at DOE has an awesome blog post about why this mashup matters:

"The Department of Education's National Education Technology Plan sets a goal that all students and teachers will have access to a comprehensive infrastructure for learning, when and where they need it," Cator writes. "Broadband access is a critical part of that infrastructure. This map shows the best data to date and efforts will continue to gather better data and continually refresh the maps."

March 08 2011

17:00

A perpetual motion machine for investigative reporting: CPI and PRI partner on state corruption project

There is a flaw in the investigative reporting model and it has to do with longevity. Follow me on this for a second: A reporter works months at a time scouring documents, meeting sources, verifying details, writing, and perhaps even building a database. And then the piece is published.

And that’s it.

The lifespan of investigative reporting, at least as it’s typically done through newspapers, can be disappointingly short given the painful labor and birthing process. Once stories are released, the hope is the public (or perhaps lawmakers) will pick up the torch to right the wrongs illuminated by reporters. But the drumbeat stops after a while. Reporters have to move on to new assignments, and the public’s desire to change laws and right wrongs can be overtaken by things like #Winning.

In an ambitious new project, the Center for Public Integrity, Public Radio International, and Global Integrity are trying to build a new mechanism that keeps the intensity and awareness of investigative reporting at a steady pace.

What they’re building is a fifty-state corruption risk index. Think of it like a Homeland Security threat level indicator that shows just how susceptible your state is to corruption. Already this is no small task: They plan to hire a reporter in each state to do ground-level reporting and compile information for the index as well write stories. Where they hope to transform the investigative reporting machine, though, is by going transparent and getting people invested in the project before it officially drops next year. Instead of holding onto information before the project is complete, they’ll invite the public in, ask for a little crowdsourcing, and build momentum — and a network. The goal is to make the corruption index something of a perpetual motion machine.

“The idea here is that in recent years really good, solid investigative reporting on the state level has fallen off, and state newspapers have had to make cutbacks,” said Caitlin Ginley, the project coordinator for CPI. “We see this as a great way to revitalize that.”

The corruption index is not without some precedent. In 2009, The Center for Public Integrity released States of Disclosure, a fifty-state ranking of financial disclosure laws for local legislators (and source for the map above). Ginley told me they wanted to build on that foundation for the corruption index, using financial disclosure laws, conflict-of-interest laws, FOIA regulations, lobbyist rules, and other accountability standards as indicators to gauge the likelihood of corruption.

“Reporters can take that information and see this is where [their state is] doing very poorly and report that out,” Ginley said.

In its role, Global Integrity will help by creating a methodology and guiding the analysis of the data that comes in. (Reporters will also be using Global Integrity’s Indaba tool to collect and publish information.) The end result will be much like States of Disclosure, with report cards and rankings, as well as background data from each state, Ginley said.

But the work that starts now, aside from the hiring of journalists in each state (JOB ALERT), is identifying people or organizations who can be helpful over the course of the corruption project.

“We have the tools now for people to get engaged in stories as they go along and that creates a lasting commitment so its not a one-shot deal,” said Michael Skoler, vice president of Interactive Media for PRI. Just as important as finding reporters and document-hounding is cultivating a community that can guide and assist the reporting, Skoler told me. (Skoler is familiar with the concept, having established the Public Insight Network while working for American Public Media.)

“The standard mode for investigative reporting is that people don’t talk at all about what they’re doing,” he told me.

PRI will work with its more than 800 partner stations to find expertise and build interest in the project over the next twelve months so that ideally, when the report is produced, there will be a built-in audience who can share it with others or try to minimize corruption in their state. Projects around government and budgets are ripe conditions for crowdsourcing, but Skoler thinks the crowdsourcing concept is something far too many people attempt but ultimately don’t understand.

“I think one of the misconceptions about crowdsourcing is when you crowdsource, you’re trying to attract and engage everyone. And that doesn’t work,” Skoler said. “Crowdsourcing is about reaching out to the people who are naturally interested and knowledgeable about something and inviting them to play.”

Within each state, he points out, there are honest government/open government groups, think tanks, academics, and non-profits who have an interest in state corruption and could assist in the project. Skoler thinks approaching these specific people and groups, unlike asking the general readership for help, could produce better results.

He also things that approach could help to increase the reach of investigative reporting. Instead of hoping that the results a reporter produces will automatically take on a life of their own, the corruption index hopes to apply strategy to extending the shelf life of accountability journalism. As Skoler puts it, “It’s a new way of thinking about impact for investigative journalism — and about building impact in through a whole process.”

February 11 2011

18:08

Steve Williams, Director, Corporate Social Responsbility, SAP

Hi everyone,

As part of the global SAP Corporate Social Responsibility team, I am responsible for managing our worldwide Technology Donation program that provides free reporting and data visualizaton tools to over 900 non-profts each year in 15 countries. We have been partnering with TechSoup for quite a while now and am excited about the many possibilites to engage.

I am most interested in building capacity in the non-profit sector through technology. At SAP we can bring a wide experience in business management along with the skills of 60,000 employees aroud the world that want to contribute. We also have a large developer ecosystem part of the SAP Community Network. We have also been supporting interesting work around impact measurement for non-profits and social enterprises through the Demonstrating Value Project

What I am most interested in from collaborators is understanding how the different pieces of technology (hardware, networking, different software systems) can be integrated and easily consumed by non-profits. I'm also interested in going beyond traditional training on specific applications to helping organizations create strategies and build operational systems that can deliver better results. Finally I want to learn from, and share with, colleagues best practices on engaging employees with technology donations and how to embed these practices into the business so that CSR programs are not "off to the side" but a core part of operations.

You can find me on twitter @constructive and my (infrequently updated) blog at http://www.constructive.net

February 09 2011

21:10

'Data and Cities' Conference Pushes Open Data, Visualizations

When I entered Stamen's offices in the Mission district of San Francisco, I saw four people gathered around a computer screen. What were they doing? Nothing less than "mapping the world" -- not as it appears in flat dimension, but how it reveals itself. And they weren't joking. Stamen, a data visualization firm, has always kept "place" central to many of their projects. They achieved this most famously through their crimespotting maps of Oakland and San Francisco, which give geographical context to the world of crime. This week they are taking on a world-sized challenge as they host a conference that focuses on cities, interactive mapping, and data.

As part of a Knight News challenge grant, this conference is part of Stamen's Citytracking project, an effort to provide the public with new tools to interact with data as it relates to urban environments. The first part of this project is called dotspotting, and is startling in its simplicity. While still in early beta stage, this project aims at creating a baseline map by imposing linkable dots on locations to yield data sets. The basic idea is to strike a balance between the free, but ultimately not-yours, nature of Google Maps and the infinitely malleable, but overly nerdy, open-source stacks that are out there.

dotspotting crop.jpg

With government agencies increasingly expected to operate within expanded transparency guidelines, San Francisco passed the nation's first open data law last fall, and many other U.S. cities have started to institutionalize this type of disclosure. San Francisco's law is basic and seemingly non-binding. It states that city departments and agencies "shall make reasonable efforts" to publish any data under their control, as long as the data does not violate other laws, in particular those related to privacy. After the law passed unanimously by the Board of Supervisors (no small feat in this terminally fractious city), departments have been uploading data at a significant rate to our data clearinghouse website, datasf. While uploading data to these clearinghouses is the first step, finding ways to truly institutionalize this process has been challenging.

Why should we care about open data? And why should we want to interact with it?

While some link the true rise of open data movement with the most recent recession, the core motivation behind this movement has always been inherent to the nature of a citizenry. Behind this movement is active citizenship. Open data in this sense can mean the right to understand the social, cultural, and societal forces constantly in play around us. As simultaneously the largest consumers and producers of data, cities have the responsibility to engage their citizens with this information. Gabriel Metcalf, executive director of SPUR (San Francisco Planning and Urban Research), and I wrote more about this, in our 2010, year in review guide.

Stamen's Citytracking project wants to make that information accessible to more than just software developers but at a level of sophistication that simultaneously allows for real analysis and widespread participation. Within the scope of this task, Stamen is attempting to converge democracy, technology, and design.

Why is this conference important?

Data and Cities brings together city officials, data visualization experts, technology fiends, and many others who fill in the gaps between these increasingly related fields.
Stamen has also designed this conference to have a mixture of formats, from practical demonstrations, to political discussions, and highly technical talks.

According to Eric Rodenbeck, Stamen's founder and CEO, "This is an exciting time for cities and data, where the literacy level around visualization seems to be rising by the day and we see huge demand and opportunity for new and interesting ways for people to interact with their digital civic infrastructure. And we're also seeing challenges and real questions on the role that cities take in providing the base layer of services and truths that we can rely on. We want to talk about these things in a setting where we can make a difference."

Data and Cities will take place February 9 - 11 and is invitation-only. In case you haven't scored an invitation, I'll be blogging about it all week.

Selected Speakers:

Jen Pahlka from Code for America - inserting developers into city IT departments across the country to help them mine and share their data.

Adam Greenfield from http://urbanscale.org/ and author of Everyware. Committed to applying the toolkit and mindset of interaction design to the specific problems of cities.

Jay Nath, City of San Francisco
http://www.jaynath.com/2010/12/why-sf-should-adopt-creative-commons, http://datasf.org

November 08 2010

09:03

Open Government: un nuovo paradigma democratico

Anche in Italia c’è molto interesse per le tematiche relative all’Open Government e i tanti riscontri che abbiamo avuto in questa prima settimana di “scrittura collaborativa” del Manifesto ne sono una prova.

Spesso, però, il termine Open Government suscita diffidenza: sarà l’inglese (con cui gli italiani non hanno molta dimestichezza), sarà la paura che si tratti solo di un’etichetta, di una moda passeggera destinata ad essere dimenticata tra pochi mesi.

Qualche giorno fa sono stato invitato a parlarne dai Radicali Italiani e, in un breve intervento che riporto qui sotto, ho cercato di spiegare cosa si nasconde dietro questa espressione e perché – a mio avviso – non si tratta di una moda passeggera.

Voi cosa ne pensate?

Open Government: un nuovo paradigma from Ernesto Belisario on Vimeo.

November 07 2010

14:59

Alcune cose che dovremo spiegare sull’Open Gov in Italia

Caro collega che lavori nella PA italiana,
forse ti stai chiedendo se ci sia veramente bisogno di un nuovo manifesto, o se non basterebbe invece applicare quella montagna di leggi e direttive che giacciono inerti e abbandonate sugli scaffali della PA.
Io credo che innanzitutto sia necessario condividere molte idee insieme a quelli come noi che operano nelle amministrazioni pubbliche, e poiché le norme abitualmente si subiscono, è forse più utile partire dalla convergenza su un pensiero attivo e positivo per inseguire l’obiettivo di un governo aperto sul serio.
Se si comprende un’idea e i valori che si porta dietro, è più facile individuare la strada per applicarla concretamente, a prescindere dagli aspetti propriamente legislativi. Serviranno anche quelli, certamente, ma il punto di partenza è legato alla visione che si ha del ruolo dell’amministrazione rispetto ai cittadini, alle imprese, alla rete.
Sì, la rete, o forse dovremmo scrivere “la Rete”, che non è solo connettività, bit, informazioni e servizi. La Rete con la R maiuscola è qualcosa che genera un valore che prima non c’era, abilitato dalle persone e soprattutto dalle applicazioni create dalle persone. Ma questa straordinaria opportunità ha bisogno di un terreno fertile che ancora in Italia non c’è.
Questa è la prima cosa che dovremo condividere, caro collega, quali sono le “sostanze indispensabili” affinché il nostro terreno sia fertile sul serio.
Quando rendiamo disponibili in formati accessibili informazioni e servizi su Internet, noi non facciamo che una parte del nostro dovere. E’ vero, lo ammetto, stiamo rispettando le indicazioni e i vincoli del CAD, della legge Stanca e dell’ultima direttiva sui siti web delle PA. E probabilmente molti cittadini già saranno soddisfatti così, e così pure gran parte dei burocrati che ci circondano.
Ma possiamo fare molto di meglio. Abbiamo la possibilità di introdurre la nostra piccola quota di fertilizzante nel suolo della rete, in attesa che un seme più intelligente di altri pianti le sue radici succhiando informazioni lasciate lì da noi. Come? ti chiederai. Trattando l’informazione come se fosse un’infrastruttura di base dell’economia immateriale (art 4 del manifesto).
E’ semplice e non è nemmeno troppo dispendioso, ma presuppone una consapevolezza che ancora è poco diffusa. Le questioni da comprendere sono fondamentalmente quattro: i formati, le licenze, l’aggiornamento e l’accesso.
Per fare un buon lavoro dovremo rendere disponibili i dati della nostra amministrazione usando dei formati aperti, adatti ad essere interpretati dal software e non solo dagli umani. Per fare un esempio, se pubblichiamo l’elenco delle aree Wi Fi pubbliche presenti nella nostra città, con la collocazione delle antenne sul territorio, non è sufficiente che presentiamo una bella immagine della mappa delle installazioni. Questa infatti sarà un’utile informazione per i turisti e per gli abitanti, ma non sarà riutilizzabile in alcun modo dal software. Ma se noi associassimo alla mappa  anche una banale tabella dove presentiamo gli stessi dati fornendo indirizzi o coordinate geografiche, usando un formato tipo CSV, quello che si esporta anche da un banale foglio di Excell, ecco che la nostra informazione diventerebbe terreno fertile. Un’impresa o un appassionato potrebbero raccogliere tutti i dati di questo tipo delle città italiane e costruire una mappa sempre aggiornata delle aree Wi Fi pubbliche del paese. O potrebbero connettere i dati sulla presenza di antenne radio con l’insorgenza di determinati disturbi nei residenti di una specifica area, sempre se l’azienda sanitaria locale rendesse disponibili questi dati.
Si comprende subito come l’aggiornamento periodico di queste informazioni rappresenti uno degli elementi chiave insieme alla facile accessibilità attraverso la rete. Sarebbe sufficiente creare uno specifico spazio web dove raccogliere tutti i dati “liberati” della nostra amministrazione e attendere che qualche seme vada ad attecchire.
Manca ancora un elemento però, ma è fondamentale. Di chi sono i dati che andremo a rendere disponibili? Come fanno un cittadino, un’impresa, un’associazione a sapere che li possono usare liberamente? Dovremo accompagnare questi dati con una licenza che ne consenta l’utilizzo nel totale rispetto della legge. E’ facile, è un problema che a livello internazionale è già stato affrontato con successo ed ora ci sono anche delle formule italiane semplici e praticabili, come la Italian Open Data License v1.0. (Per un approfondimento sulle licenze vedi il post di Ernesto Belisario) Non resta quindi che scrivere chiaramente quale licenza utilizziamo.
Abbiamo parlato in precedenza di economia immateriale della rete, ovvero possiamo ipotizzare che oltre a generare valore pubblico qualcuno ci potrebbe pure far soldi. Sarebbe un problema? Francamente credo proprio di no. Se i dati pubblici  sono in rete e qualcuno li usa intelligentemente per estrarvi qualcosa di utile, ben venga, saranno il mercato e soprattutto le persone a decidere se vale la pena di  spendere o meno per quel prodotto (art 5).
A questo punto ti starai chiedendo dove sta il trucco, se tutto è così semplice allora perché non lo abbiamo già fatto. Ecco, la risposta è semplice, dobbiamo ancora metabolizzare il concetto che sta alla base di tutto questo agire concretamente: occorre un nuovo modello di trasparenza (art 3). Oggi quello che ci dicono le norme è: “devi rendere trasparente tutto quello che è elencato di seguito”. È un elenco lungo, con le ultime disposizioni le cose sono tante, è vero, ma ciò di cui stiamo parlando è completamente diverso. L’idea che propone il manifesto potrebbe essere riassunta così: “rendi trasparente tutto quello che non è espressamente proibito dalla legge nell’intento di tutelare della privacy dei cittadini, e usa tutte le risorse a tua disposizione per informarli e coinvolgerli perché l’intelligenza collettiva può regalarti grandi risultati” (art  6 e 7).
Attento però, caro collega, se senti di condividere alcune di queste idee, sappi che potrebbe venirti in mente di maneggiare anche materiale che qualcuno considera invece scomodo o non adatto. Non tu, ovviamente, ma tanti amministratori e burocrati che ritengono sia meglio che certe cose “non si sappiano troppo in dettaglio”: i dati ambientali, l’inquinamento, il rumore, le spese, gli investimenti e così via. Continueranno quindi a chiederti di pubblicare sul web brillanti report e comunicati stampa, corposi PDF da scaricare pensando che il frutto delle loro accurate elaborazioni sia in grado di soddisfare il requisito della trasparenza.
Beh, oggi non è più così, e noi tutti stiamo qui condividendo l’idea che in questa rete sempre più popolata di applicazioni accoglieremo con piacere i frutti dell’intelligenza collettiva se saranno in grado di semplificarci la vita e ci faranno incidere meglio sulla società che abitiamo.

Claudio Forghieri

October 06 2010

20:57

The Texas Observer - Creating a Citizen Watchdog Network

Greetings all - The Texas Legislative Session is coming up in January and, as the old saying here goes, no man or his property is safe. The Texas Observer has an active social networking segment, IObserve, on our site, but it's only open to subscribers who donated a minimum of $35 to the Observer. Funding would allow us to improve our software and, most importantly, IT WOULD ALLOW US TO MAKE THE SITE AVAILABLE TO ALL, with no sign up fee. Our site traffic has double in the last year and the Observer is known as the go-to group for legislative coverage. We know we can make iObserve a buzzing hub of citizen/reporter interaction and information.

read more

August 05 2010

17:00

How The Guardian is pioneering data journalism with free tools

The Guardian takes data journalism seriously. They obtain, format, and publish journalistically interesting data sets on their Data Blog, they track transparency initiatives in their searchable index of world government data, and they do original research on data they’ve obtained, such as their amazing in-depth analysis of 90,000 leaked Afghanistan war documents. And they do most of this with simple, free tools.

Data Blog editor Simon Rogers gave me an action-packed interview in The Guardian’s London newsroom, starting with story walkthroughs and ending with a philosophical discussion about the changing role of data in journalism. It’s a must-watch if you’re wondering what the digitization of the world’s facts means for a newsroom. Here’s my take on the highlights; a full transcript is below.

The technology involved is surprisingly simple, and mostly free. The Guardian uses public, read-only Google Spreadsheets to share the data they’ve collected, which require no special tools for viewing and can be downloaded in just about any desired format. Visualizations are mostly via Many Eyes and Timetric, both free.

Data Blog posts are often related to or supporting of news stories, but not always. Rogers sees the publishing of interesting data as a journalistic act that stands alone, and is clear on where the newsroom adds value:

I think you have to apply journalistic treatment to data. You have to choose the data in a selective, editorial fashion. And I think you have to process it in a way that makes it easy for people to use, and useful to people.

The Guardian curates far more data than it creates. Some data sets are generated in-house, such as its yearly executive pay surveys, but more often the data already exists in some form, such as a PDF on a government web site. The Guardian finds such documents, scrapes the data into spreadsheets, cleans it, and adds context in a Data Blog post. But they also maintain an index of world government data which scrapes open government web sites to produce a searchable index of available data sets.

“Helping people find the data, that’s our mission here,” says Rogers. “We want people to come to us when they’re looking for data.”

In alignment with their open strategy, The Guardian encourages re-use and mashups of their data. Readers can submit apps and visualizations that they’ve created, but data has proven to be just as popular with non-developers — regular folks who want the raw information.

Sometimes readers provide additional data or important feedback, typically through the comments on each post. Rogers gives the example of a reader who wrote in to say that the Academy schools listed in his area in a Guardian data set were in wealthy neighborhoods, raising the journalistically interesting question of whether wealthier schools were more likely to take advantage of this charter school-like program. Expanding on this idea, Rogers says,

What used to happen is that we were the kind of gatekeepers to this information. We would keep it to ourselves. So we didn’t want our rivals to get ahold of it, and give them stories. We’d be giving stories away. And we wouldn’t believe that people out there in the world would have any contribution to make towards that.

Now, that’s all changed now. I think now we’ve realized that actually, we’re not always the experts. Be it Doctor Who or Academy schools, there’s somebody out there who knows a lot more than you do, and can thus contribute.

So you can get stories back from them, in a way…If you put the information out there, you always get a return. You get people coming back.

Perhaps surprisingly, data also gets pretty good traffic, with the Data Blog logging a million hits a month during the recent election coverage. “In the firmament of Guardian web sites that’s not bad. That’s kind of upper tier,” says Rogers. “And this is only after being around for a year.” (The even younger Texas Tribune also finds its data pages popular, accounting for a third of total page views.)

Rogers and I also discussed the process of getting useful data out of inept or uncooperative governments, the changing role of data specialists in the newsroom, and how the Guardian tapped its readers to produce the definitive database of Doctor Who villains. Here’s the transcript, lightly edited.

JS: All right. So. I’m here with Simons Rogers in the Guardian newsroom in London, and you’re the editor of the Data Blog.

SR: That’s right, and I’m also a news editor so I work across the organization on data journalism, essentially.

JS: So, first of all, can you tell us what the Data Blog is?

SR: Ok, well basically it came about because, as I said I was a news editor working a lot with graphics, and we realized we were just collecting enormous amounts of data. And we though, well wouldn’t our readers be interested in seeing that? And when the Guardian Open Platform launched, it seemed a good time to think about opening up– we were opening up the Guardian to technical development, so it seemed a good time to open up our data collections as well.

And also it’s the fact that increasingly we’ve found people are after raw information. If you looked– and there’s lots of raw information online, but if you start searching for that information you just get bewildering amounts of replies back. If if you’re looking for, say, carbon emissions, you get millions of entries back. So how do you know what the right set of data is? Whereas we’ve already done that set of work for our readers, because we’ve had to find that data, and we’ve had to choose it, and make an editorial selection about it, I suppose. So we thought we were able to cut out the middle man for people.

But also we kind of thought when we launched it, actually, what we’d be doing is creating data for developers. There seemed to be a lot of developers out there at that point who were interested in raw information, and they would be the people who would use the data blog, and the open platform would get a lot more traffic.

And what actually happened, what’s been interesting about it, is that– what’s actually happened is that it’s been real people who have been using the Data Blog, as much as developers. Probably more so than developers.

JS: What do you mean “real people”?

SR: Real people, I suppose what I mean is that, somebody who’s just interested in finding out what a number is. So for instance, here at the moment we’ve got a big story about a government scheme for building schools, which has just been cut by the new government. It was set up by the old government, who invested millions of pounds into building new school buildings. And so, we’ve got the full list of all the schools, but the parliamentary constituency that they’re in, and where they are and what kind of project they were. And that is really, really popular today, that’s one of our biggest things, because there’s a lot of demonstrations about it, it’s a big issue of the day. And so I would guess that 90% of people looking at it are just people who want to find out what the real raw data is.

And that’s the great thing about the internet, it gives you access to the raw, real information. And I think that’s what people really crave. They want the interpretation and the analysis from people, but they also want the veracity of seeing the real thing, without having it aggregated or put together. They just want to see the raw data.

JS: So you publish all of the original numbers that you get from the government?

SR: Well exactly. The only time– with the Data Blog, I try to make it as newsy as possible. So it’s often hooked around news stories of the day. Partly because it helps the traffic, and you’re kind of hooking on to existing requirements.

Obviously we do– it’s just a really eclectic mix of data. And I can show you the screen, for a sec.

JS: All right. Let’s see something.

SR: Okay, so this is the data blog today. So obviously we’ve got Afghanistan at the top. Afghanistan is often at the top at the moment. This is a full list of everybody who’s died, every British casualty who’s died and been wounded over time. So you’ve got this data here. We use, I tend to use a lot of third party services. This is a company called Timetric, who are very good at visualizing time series data. It takes about five minutes to create that, and you can roll over and get more information.

JS: So is that a free service?

SR: Yeah, absolutely free, you just sign up, and you share it. It works a bit like Many Eyes, you know the IBM service.

JS: Yeah.

SR: We’ll embed these Google docs. We use Google docs, Google spreadsheets to share all our information because it’s very for people to download it. So say you want to download this data. You click on the link, and it will take you through in a second to, there you go, it’s the full Google spreadsheet. And you’ve got everything on here. You’ve got, these are monthly totals, which you can’t get anywhere else, because nobody else does that information.

JS: What do you mean nobody else does it?

SR: Well nobody else bothers to put it together month by month. You can get totals by year from, iCasualties I think do it, but we’ve just collected some month by month, because often we’ve had to draw graphics where it’s month by month. It’s the kind of thing, actually it’s quite interesting to be able to see which month was the worst for casualties.

We’ve got lists of names, which obviously are in a few places. We collect Afghanistan wounded statistics which are terribly confused in the UK, because what they do is they try and make them as complicated as possible. So, the most serious ones, NOTICAS is where your next of kin is notified. That’s a serious event, but also you’ve got all those people evacuated. So anyway, this kind of data. We also keep amputation data, which is a new set that the government refused to release until recently, and a Guardian reporter was instrumental in getting this data released. So we kind thought, maybe we should make this available for people.

So you get all this data, and then what you can do, if you click on “File” there, you can download it as Excel, XML, CSV, or whatever format you want. So that’s why we use Google speadsheets. It’s the kind of thing that’s a very, very easily accessible format for people.

So really what we do is we try and encourage a community, a community to grow up around data and information. So every post has got a talk facility on it.

Anyway, going through it. So this is today’s Data Blog, where you’ve got Afghanistan, Academy schools in the UK. The schools are run by the state, pretty much.

JS: So just to clarify this for the American audience, what’s an Academy school?

SR: Ok, well basically in the UK most schools are state schools, that most children go to. State schools are, we all pay for them, they’re paid for out of our taxes. And they’re run at a local level, which obviously has it’s advantages because it means that you are, kind of, working to an area. What the new government’s proposing to do is allow any school that wants to to become an Academy. And what an Academy is is a school that can run its own finances, and own affairs.

And what we’ve got is we’ve got the data, the government’s published the data — as a PDF of course because governments always publish everything as a PDF, in this country anyway — and what they give you, which we’ve scraped here, is a list of every school in the UK which has expressed an interest. So you’ve got the local authority here, the name of the school, type of school, the address, and the post code. Which is great, because that’s good data, and because it’s on a PDF we can get that into a spreadsheet quite easily.

JS: So did you have to type in all of those things from a PDF, or cut and paste them?

SR: Good god no. No, no, we have, luckily we’ve got a really good editorial support team here, who are, thanks to the Data Blog, are becoming very experienced at getting data off of PDFs. Because every government department would much rather publish something as a PDF, so they can act as if they’re publishing the data but really it’s not open.

JS: So that’s interesting, because in the UK and the US there’s this big government publicity about, you know, we’re publishing all this data.

SR: Absolutely.

JS: But you’re saying that actually–

SR: It’s not 100 percent yet. So, I’ll show you in a second that what they tend to do is just publish– most government departments still want to publish stuff as PDFs. They can’t quite get out of that thing. Or want to say, why would somebody want a spreadsheet? They don’t really get it. A lot of people don’t get it.

And, we wanted the spreadsheet so you can do stuff like this, which is, this is a map of schools interested in becoming Academies by area. And so because we have that raw data in spreadsheet form we can work out how many in the area. You can see suddenly that this part of England, Kent, has 99 schools, which is the biggest in the country. And only one area, which is Barking, up here, in London, which is, sorry, is down here in London, but anyway that has no schools applying at all.

And the government also always said that at the beginning that it would mainly be schools which weren’t “outstanding” would apply. But actually if you look at the figures, which again, we can do, the majority of them are outstanding schools. So they’re already schools which are good, which are applying to become academies. Which kind of isn’t the point. But that kind of analysis, that’s data journalism in a sense. It’s using the numbers to get a story, and to tell a story.

JS: And how long did that story take you to put together? To get the numbers, and do the graphics, and…?

SR: Well, I was helped a bit, because I got, I’ve had one of my helpers who works in editorial support to get the data onto a spreadsheet. And in terms of creating the graphic we have a fantastic tool here, which is set up by one of our technical development team who are over there, and what it does, is it allows you to paste a load of data, geographic data, into this box, and you tell it what kind, is it parliamentary constituency, or local authority, or educational authority, or whatever, however the different regional differentiations we have in the UK, and it will draw a map for you. So this map here was drawn by computer, basically, and then one of the graphics guys help sort out the labels and finesse it and make it look beautiful. But it saves you the hard work of coloring up all those things. So actually that took me maybe a couple of hours. In total.

JS: How about getting the data, how long did that take?

SR: Oh well luckily that data– you know the government makes the data available. But like I say, as a PDF file. So this is the government site, and that’s the list there, and you open it, it opens as a PDF. Because we’ll link to that.

But luckily the guys in the ESD [editorial services department] are very adept now, because of the Data Blog, at getting data into spreadsheets. So, you know they can do that in 20 minutes.

JS: So how many people are working on data overall, then?

SR: Well, in terms of– it’s my full time job to do it. I’m lucky in that I’ve got an awful lot of people around here who have got an interest who I can kind of go and nudge, and ask. It’s a very informal basis, and we’re looking to formalize that, at the moment. We’re working on a whole data strategy, and where it goes. So we’re hoping to kind of make all of these arrangements a bit more formal. But at the moment I have to fit into what other people are doing. But yeah, we’ve got a good team now that can help, and that’s really a unique thing.

So I was going through the Data Blog for you. So this is a typical, a weird day, so schools, and then we’ve got another schools thing because it’s a big schools day today. This is school building projects scrapped by constituency, full list. Now, this is another where the government didn’t make the data easily available. The department for education published a list of all the school projects that were going to be stopped when the government cut the funding, some of which is going towards creating Academy schools, which is why this is a bit of an issue in the country at the moment. And we want to know by constituency how it was working. So which MPs were having the most school projects cut, in their constituency. And we couldn’t get that list out of the department of education, but one MP had lodged it with the House of Commons library. So we managed to get it from the House of Commons library. But it didn’t come in a good form, it came in a PDF again, so again we had to get someone from tech to sort it out for us.

But the great thing is that we can do something like this, which is a map of projects stopped by constituency, by MP. And most of the projects we’ve stopped were in Labour seats. As you know Labour are not in power at the moment. So we can do some of this sort of analysis which is great. So there were 418 projects stopped in Labour constituent seats, and 268 stopped in conservative seats. So basically 40% of Labour MPs had a project stopped, at least one project stopped in their seat, compared to only 27% of Conservatives, and 24% of the Dems who are in power at the moment.

JS: So would it be accurate to say the data drove this story, or showed this story, or…?

SR: Data showed this story, which is great, but the one thing, the caveat — of course, the raw numbers are never 100% — the caveat was there were more projects going on in Labour areas because Labour government, previous government which is Labour set up the projects, and they gave more projects to Labour areas. So you can read it either way.

JS: And you said this in the story?

SR: We said this in the story. Absolutely. We always try and make the caveats available for people. So that’s a big story today, because of there are demonstrations about it in London. You’ve come to us on a very education-centered day today.

But there’s other stuff on the blog too. This is a very British thing. We did this because we thought it would be an interesting project to do. I had somebody in for a week and they didn’t have much to do so I got them to make a list of every Doctor Who villain ever.

JS: This was an intern project?

SR: This was an intern project. We kinda thought, yeah, we’ll get a bit of traffic. And we’ve never had so much involvement in a single piece ever. It’s had 500 retweets, and when you think most pieces will get 30 or 40, it’s kind of interesting. The traffic has been through the roof. And the great thing is, so we created–

JS: Ooh, what’s this? This is good.

SR: It’s quite an easy– we use ManyEyes quite a lot, which is very very quick to create lovely little graphics. And this is every single Doctor Who villain since the start of the program, and how many times they appear. So you see the Daleks lead the way in Doctor Who.

JS: Yeah, absolutely.

SR: Followed by the Cybermen, and the Masters in there a lot. And there are lots of other little things. But we started off with about 106 villains in total, and now we’re up to– we put it out there and we said to people, we know this isn’t going to be the complete list, can you help us? And now we’ve got 212. So my weekend has basically been– I’ll show you the data sheet, it’s amazing. You can see the comments are incredible. You see these kinds of things, “so what about the Sea Devils? The Zygons?” and so on.

And I’ll show you the data set, because it’s quite interesting. So this is the data set. Again Google docs. And you can see over here on the right hand side, this is how many people looking at it at any one time. So at that moment there are 11 people looking on. There could be 40 or 50 people looking at any one moment. And they’re looking and they’re helping us make corrections.

JS: So, wait– this data set is editable?

SR: No, we haven’t made it editable, because we’ve had a bad experience people coming to editable ones and mucking around, you know, putting swear words on stuff.

JS: So how do they help you?

SR: Well they’ll put stuff in the comments field and I’ll go in and put it on the spreadsheet. Because I want a sheet that people can still download. So now we’ve got, we’re now up to 203. We’ve doubled the amount of villains thanks to our readers. It’s Doctor Who. And it just shows we’re an eclectic– we’re a broad church on the Data Blog. Everything can be data. And that’s data. We’ve got number of appearances per villain, and it’s a program that people really care about. And it’s about as British as it’s possible to get. But then we also have other stuff too– and there we go, crashed again.

JS: Well let me just ask you a few questions, and take this opportunity to ask you some broader questions. Because we can do this all day. And I have. I’ve spent hours on your data blog because I’m a data geek. But let’s sort of bring it to some general questions here.

SR: Okay. Go for it.

JS: So first of all, I notice you have the Data Blog, you also have the world data index.

SR: Yes. Now the idea of that was that, obviously lots of governments around the world have started to open up their data. And around the time that the British government was– a lot of developers here were involved in that project — we started to think, what can we do around this that would help people, because suddenly we’ve got lots of sites out there that are offering open government data. And we thought, what if we could just gather them all together into one place. So you’ve got a single search engine. And that’s how we set up the world data search. Sorry to point you at the screen again.

JS: No that’s fine, that’s fine.

SR: Basically, so what we did, we started off with just Australia, New Zealand, UK and America. And basically what this site does, is it searches all of these open government data sites. Now we’ve got Australia, Toronto in Canada, New Zealand, the UK, London, California, San Francisco, and data.gov.

So say you search for “crime,” say you’re interested in crime. There you go. So you come back here, you see you’ve got results here from the UK, London, you’ve got results from data.gov in America, San Francisco, New Zealand and Australia. Say you’re interested in just seeing– you live in San Francisco and you’re only interested in San Francisco results. You’ve three results. And there you go, you click on that.

And you’re still within the Guardian site because what we’re asking people to do is help us rank the data, and submit visualizations and applications. So we want people to tell us what they’ve done with the data.

But anyway if you go and click on that, and you click on “download,” and it will start downloading the data for you. Or, what it will do is take you to the terms and conditions. We don’t bypass any T&Cs. The T&C’s come alongside. But you click on that, you agree to that, and then you get the data. So we really try and make it easy for people. There you go. And this is the crime incidence data. Very variable. This is great because it’s KML files, so if you wanted to visualize that you get really great information. It’s all sorts of stuff. Sometimes it’s CSVs.

JS: What’s a KML file?

SR: So, Google Earth.

JS: Okay.

SR: Sorry. So, it’s mapping, a mapping file straight away.

SR: Okay, so one of the things we ask people to do is to submit visualizations and applications they’ve produced. So for instance, London has some very very good open data. If you haven’t looked around the Data Store, it’s really worth going to. And one of these things they do is they provide a live feed of all the London traffic cameras. You can watch them live. And this is a lovely thing, because what somebody’s done is they’ve written an iPad application. So you can watch live TFL, Transport for London, traffic cameras on your iPad.

And you see that data set has been rated. A couple of people have gone in there and rated it. You’ve got a download button, the download is XML. So we try and help people around this data. And this is growing now. Every time somebody launches an open government data site we’re gonna put it on here, and we’re working on a few more at the moment. So we want it to be the place that people go to. Every time you Google “world government data” it pops up at the top, which is what you want. You want people who are just trying to compare different countries and don’t know where to start, to help them find a way through this maze of information that’s out there.

JS: So do you intend to do this for every country in the world?

SR: Every country in the world that launches an open government data site, we’ll whack it on here. And we’re working– at the moment there are about 20 decent open government data sites around the world. We’re picking those up. We’ve got on here now, how many have we got? One, two, three, four, five, six, seven, eight. We’ll have 20 on in the next couple of weeks. We’re really working through them at the moment.

And what this does is, it scrapes them. So basically, we don’t– for us it’s easy to manage because we don’t have to update these data sets all the time. The computer does that for us. But basically, what we do provide people with is context and background information, because you’re part of the data site there.

JS: So let me make sure I have this clear. So you’re not sucking down the actual data, you’re sucking down the list and descriptions of the data sets available?

SR: Absolutely. So we’re providing people, because basically we want it to be as updated as possible. We don’t– if we just uploaded onto our site, that would kind of be pointless, and it would mean it would be out of date. This way, if something pops up on data.gov and stays there, we’ll get it quick on here. We’ll help people find it. Helping people find the data, that’s our mission here. It’s not just generating traffic, it’s to help people find the information, because we want people to come to us when they’re looking for data.

JS: So, okay. You’ve talked about, it sounds like, two different projects. The Data Blog. where you collect and clean up and present data that you–

SR: That we find interesting. We’re selective.

JS: In the process of the Guardian’s newsgathering.

SR: Yeah, and just things that are interesting anyway. So the Doctor Who post that we were just looking at is just interesting to do. It’s not anything we’re going to do a story about. And often they’ll be things that are in the news, say that day, and I’ll think “oh that’s a good thing to put on the Data Blog.” So it could be crime figures, or it could be– and sometimes, the side effect of that is a great side effect because you end up with a piece in the paper, or a piece on the web site. But often it might be the Data Blog is the only place to get that information.

JS: And you index world government data sites.

SR: Yeah, absolutely.

JS: Does the Guardian do anything else with data?

SR: Yeah, well what we do is, we’re doing a lot of Guardian research with data. So what we want to do is give people a kind of way into that. So for instance, we do do a lot of data-based projects. So for instance we’re doing an executive pay survey of all the biggest companies, how much they pay their bosses and their chief executives. That has always been a thing the paper’s always done for stories. And now what we’ll do is we’ll make that stuff available– that data available for people. So instead of just raw data journalism, it’s quite old data journalism. We’ve been doing it for ten years. But we used to just call it a survey. Now it’s data journalism, because it’s getting stories out of numbers. So we’ll work with that, and we’ll publish that information for people to see. And there are a couple of big projects coming up this week, which I really can’t tell you about, but next week it will be obvious what they are.

JS: Probably by the time this goes up we’ll be able to link to them.

[Simon was referring to the Guardian's data journalism work on the leaked Afghanistan war logs, described in a thorough post on the Data Blog.]

SR: Yeah, I’ll mail you about them. But we’ve got now an area of expertise. So increasingly what I’m finding is that I’m getting people coming to me within The Guardian, saying, so we’ve got this spreadsheet, well how can I do this? So for instance that Academies thing we were just looking at, we were really keen to find out which areas were the most, where the most schools were, for the paper. The correspondent wanted to know that. So actually, because we’ve got this area of expertise now in managing data, we’re becoming kind of a go-to place within The Guardian, for journalists who are just writing stories where they need to know something, or they need to find some information out, which is an interesting side effect. Because it used to be that journalists were kind of scared of numbers, and scared of data. I really think that was the case. And now, increasingly, they’re trying to embrace that, and starting to realize you can get stories out of it.

JS: Well that’s really interesting. Let’s talk for a minute about how this applies to other newsrooms, because it’s– as you say, journalists have been traditionally scared of data.

SR: Yeah, absolutely. You could say they prided themselves, in this country anyway, they prided themselves on lack of mathematical ability. I would say.

JS: Which seems unfortunate in this era.

SR: Yeah, absolutely. Yeah, yeah, absolutely.

JS: But especially a lot of our readers are from smaller newsrooms, and so what kind of technical capability do you need to start tracking data, and publishing data sets?

SR: I think it’s really minimal. I mean, the thing is that actually, what we’re doing is really working with a basic, most of the time just basic spreadsheet packages. Excel or whatever you’ve got. Excel is easy to use, but it could be any package really. And we’re using Google spreadsheets, which again is widely available for people to do information. We’re using visualization tools which are again, ManyEyes or Timetric which are widely available and easy to use. I think what we’re doing is just bringing it together.

I think traditionally that journalists wouldn’t regard data journalism as journalism. It was research. Or, you know, how is publishing data– is that journalism? But I think now, what is happening is that actually, what used to happen is that we were the kind of gatekeepers to this information. We would keep it to ourselves. So we didn’t want our rivals to get ahold of it, and give them stories. We’d be giving stories away. And we wouldn’t believe that people out there in the world would have any contribution to make towards that. Now, that’s all changed now. I think now we’ve realized that actually, we’re not always the experts. Be it Doctor Who or Academy schools, there’s somebody out there who knows a lot more than you do, and can thus contribute. So you can get stories back from them, in a way. So we’re receiving the information much more.

JS: So you publish the data, and then other people build stories out of it, is that what you’re saying?

SR: Other people will let us know– well, we publish say, well that’s an interesting story, or this is a good visualization. We’ve published data for other people to visualize. We thought, that’s quite an interesting thing to mash it up with, we should do that ourselves. So there’s that thing, and there’s also the fact that if you put the information out there, you always get a return. You get people coming back.

So for instance the Academies thing today that we were talking about. We’ve had people come back saying, well I live in Derbyshire and I know that those schools are in quite wealthy areas. So we start to think, well is there a trend towards schools in wealthy areas going to this, and schools in poorer areas not going to this.

So it gives you extra stories or extra angles on stories you wouldn’t think of. And I think that’s part of it. And I think partly there’s just the realization that just publishing data in itself, because it’s interesting, is a journalistic enterprise. Because I think you have to apply journalistic treatment to that data. You have to choose the data in a selective, editorial fashion. And I think you have to process it in a way that makes it easy for people to use, and useful to people.

JS: So last question here, which is of course going to be on many editors’ and publishers’ minds.

SR: Sure.

JS: Let’s talk about traffic and money. How does this contribute to the business of The Guardian?

SR: Okay, it’s a new– it’s an experiment for us, but traffic-wise it’s been pretty healthy. We’ve had– during the election we were getting a million page impressions in a month. Which is not bad. On the Data Blog. Now, as a whole, out of the 36 million that The Guardian gets, it doesn’t seem like a lot. But actually, in the firmament of Guardian web sites that’s not bad. That’s kind of upper tier. And this is only after being around for a year.

So in terms of what it gives us, it gives the same as producing anything that produces traffic gives us. It’s good for the brand, and it’s good for The Guardian site. In the long run, I think that there is probably canny money to be made out of there, for organizations that can manage and interpret data. I don’t know exactly how, but I think we’d have to be pretty dumb if we don’t come up with something. I’d be very surprised. It’s an area where there’s such a lot of potential. There are people who don’t really know how to manage data and don’t really know how to organize data that– for us to get involved in that area. I really think that.

But also I think that just journalistically, it’s as important to do this as it is to write a piece about a fashion week or anything else we might employ a journalist to do. And in a way it’s more important, because if The Guardian is about open information, which– since the beginning of The Guardian we’ve campaigned for freedom of information and access to information, and this is the ultimate expression of that.

And we, on the site, we use the phrase “facts are sacred.” And this comes from the famous C. P. Scott who said that “comment is free,” which as you know is the name of our comment site, but “facts are sacred” was the second part of the saying. And I kinda think that is– you can see it on the comment site, there you go. “Comment is free, but facts are sacred.” And that’s what The Guardian’s about. I really think that, you know, this says a lot about the web. Interestingly, I think that’s how the web is changing, in the sense that a few years ago it was just about comment. People wanted to say what they thought. Now I think it’s, increasingly, people want to find out what the facts are.

JS: All right, well, thank you very much for a thorough introduction to The Guardian’s data work.

SR: Thanks a lot.

Data Blog posts are often related to or supporting of news stories, but not always. Rogers sees the publishing of interesting data as a journalistic act that stands alone, and is clear on where the newsroom adds value:

I think you have to apply journalistic treatment to data. You have to choose the data in a selective, editorial fashion. And I think you have to process it in a way that makes it easy for people to use, and useful to people.

The Guardian curates far more data than it creates. Some data sets are generated in-house, such as the Guardian’s yearly executive pay surveys, but more often the data already exists in some form, such as a PDF on a government web site. The Guardian finds such documents, scrapes the data into spreadsheets, cleans it, and adds context in a Data Blog post. But they also maintain an index of world government data which scrapes open government web sites to produce a searchable index of available data sets.

“Helping people find the data, that’s our mission here,” says Rogers. “We want people to come to us when they’re looking for data.”

In alignment with their open strategy, The Guardian encourages re-use and mashups of their data. Readers can submit apps and visualizations that they’ve created, but data has proven to be just as popular with non-developers — regular folks who want the raw information.

Sometimes readers provide additional data or important feedback, typically through the comments on each post. Rogers gives the example of a reader who wrote in to say that the Academy schools listed in his area in a Guardian data set were in wealthy neighborhoods, raising the journalistically interesting question of whether wealthier schools were more likely to take advantage of this charter school-like program. Expanding on this idea, Rogers says,

What used to happen is that we were the kind of gatekeepers to this information. We would keep it to ourselves. So we didn’t want our rivals to get ahold of it, and give them stories. We’d be giving stories away. And we wouldn’t believe that people out there in the world would have any contribution to make towards that.

Now, that’s all changed now. I think now we’ve realized that actually, we’re not always the experts. Be it Doctor Who or Academy schools, there’s somebody out there who knows a lot more than you do, and can thus contribute.

So you can get stories back from them, in a way. … If you put the information out there, you always get a return. You get people coming back.

Perhaps surprisingly, data also gets pretty good traffic, with the Data Blog logging a million hits a month during the recent election coverage. “In the firmament of Guardian web sites that’s not bad. That’s kind of upper tier,” says Rogers. “And this is only after being around for a year.” (The even younger Texas Tribune also finds its data pages popular, accounting for a third of total page views.)

Rogers and I also discussed the process of getting useful data out of inept or uncooperative governments, the changing role of data specialists in the newsroom, and how the Guardian tapped its readers to produce the definitive database of Doctor Who villains. Here’s the transcript, lightly edited.

JS: All right. So. I’m here with Simons Rogers in the Guardian newsroom in London, and you’re the editor of the Data Blog.

SR: That’s right, and I’m also a news editor so I work across the organization on data journalism, essentially.

JS: So, first of all, can you tell us what the Data Blog is?

SR: Ok, well basically it came about because, as I said I was a news editor working a lot with graphics, and we realized we were just collecting enormous amounts of data. And we though, well wouldn’t our readers be interested in seeing that? And when the Guardian Open Platform launched, it seemed a good time to think about opening up– we were opening up the Guardian to technical development, so it seemed a good time to open up our data collections as well.

And also it’s the fact that increasingly we’ve found people are after raw information. If you looked– and there’s lots of raw information online, but if you start searching for that information you just get bewildering amounts of replies back. If if you’re looking for, say, carbon emissions, you get millions of entries back. So how do you know what the right set of data is? Whereas we’ve already done that set of work for our readers, because we’ve had to find that data, and we’ve had to choose it, and make an editorial selection about it, I suppose. So we thought we were able to cut out the middle man for people.

But also we kind of thought when we launched it, actually, what we’d be doing is creating data for developers. There seemed to be a lot of developers out there at that point who were interested in raw information, and they would be the people who would use the data blog, and the open platform would get a lot more traffic.

And what actually happened, what’s been interesting about it, is that– what’s actually happened is that it’s been real people who have been using the Data Blog, as much as developers. Probably more so than developers.

JS: What do you mean “real people”?

SR: Real people, I suppose what I mean is that, somebody who’s just interested in finding out what a number is. So for instance, here at the moment we’ve got a big story about a government scheme for building schools, which has just been cut by the new government. It was set up by the old government, who invested millions of pounds into building new school buildings. And so, we’ve got the full list of all the schools, but the parliamentary constituency that they’re in, and where they are and what kind of project they were. And that is really, really popular today, that’s one of our biggest things, because there’s a lot of demonstrations about it, it’s a big issue of the day. And so I would guess that 90% of people looking at it are just people who want to find out what the real raw data is.

And that’s the great thing about the internet, it gives you access to the raw, real information. And I think that’s what people really crave. They want the interpretation and the analysis from people, but they also want the veracity of seeing the real thing, without having it aggregated or put together. They just want to see the raw data.

JS: So you publish all of the original numbers that you get from the government?

SR: Well exactly. The only time– with the Data Blog, I try to make it as newsy as possible. So it’s often hooked around news stories of the day. Partly because it helps the traffic, and you’re kind of hooking on to existing requirements.

Obviously we do– it’s just a really eclectic mix of data. And I can show you the screen, for a sec.

JS: All right. Let’s see something.

SR: Okay, so this is the data blog today. So obviously we’ve got Afghanistan at the top. Afghanistan is often at the top at the moment. This is a full list of everybody who’s died, every British casualty who’s died and been wounded over time. So you’ve got this data here. We use, I tend to use a lot of third party services. This is a company called Timetric, who are very good at visualizing time series data. It takes about five minutes to create that, and you can roll over and get more information.

JS: So is that a free service?

SR: Yeah, absolutely free, you just sign up, and you share it. It works a bit like Many Eyes, you know the IBM service.

JS: Yeah.

SR: We’ll embed these Google docs. We use Google docs, Google spreadsheets to share all our information because it’s very for people to download it. So say you want to download this data. You click on the link, and it will take you through in a second to, there you go, it’s the full Google spreadsheet. And you’ve got everything on here. You’ve got, these are monthly totals, which you can’t get anywhere else, because nobody else does that information.

JS: What do you mean nobody else does it?

SR: Well nobody else bothers to put it together month by month. You can get totals by year from, iCasualties I think do it, but we’ve just collected some month by month, because often we’ve had to draw graphics where it’s month by month. It’s the kind of thing, actually it’s quite interesting to be able to see which month was the worst for casualties.

We’ve got lists of names, which obviously are in a few places. We collect Afghanistan wounded statistics which are terribly confused in the UK, because what they do is they try and make them as complicated as possible. So, the most serious ones, NOTICAS is where your next of kin is notified. That’s a serious event, but also you’ve got all those people evacuated. So anyway, this kind of data. We also keep amputation data, which is a new set that the government refused to release until recently, and a Guardian reporter was instrumental in getting this data released. So we kind thought, maybe we should make this available for people.

So you get all this data, and then what you can do, if you click on “File” there, you can download it as Excel, XML, CSV, or whatever format you want. So that’s why we use Google speadsheets. It’s the kind of thing that’s a very, very easily accessible format for people.

So really what we do is we try and encourage a community, a community to grow up around data and information. So every post has got a talk facility on it.

Anyway, going through it. So this is today’s Data Blog, where you’ve got Afghanistan, Academy schools in the UK. The schools are run by the state, pretty much.

JS: So just to clarify this for the American audience, what’s an Academy school?

SR: Ok, well basically in the UK most schools are state schools, that most children go to. State schools are, we all pay for them, they’re paid for out of our taxes. And they’re run at a local level, which obviously has it’s advantages because it means that you are, kind of, working to an area. What the new government’s proposing to do is allow any school that wants to to become an Academy. And what an Academy is is a school that can run its own finances, and own affairs.

And what we’ve got is we’ve got the data, the government’s published the data — as a PDF of course because governments always publish everything as a PDF, in this country anyway — and what they give you, which we’ve scraped here, is a list of every school in the UK which has expressed an interest. So you’ve got the local authority here, the name of the school, type of school, the address, and the post code. Which is great, because that’s good data, and because it’s on a PDF we can get that into a spreadsheet quite easily.

JS: So did you have to type in all of those things from a PDF, or cut and paste them?

SR: Good god no. No, no, we have, luckily we’ve got a really good editorial support team here, who are, thanks to the Data Blog, are becoming very experienced at getting data off of PDFs. Because every government department would much rather publish something as a PDF, so they can act as if they’re publishing the data but really it’s not open.

JS: So that’s interesting, because in the UK and the US there’s this big government publicity about, you know, we’re publishing all this data.

SR: Absolutely.

JS: But you’re saying that actually–

SR: It’s not 100 percent yet. So, I’ll show you in a second that what they tend to do is just publish– most government departments still want to publish stuff as PDFs. They can’t quite get out of that thing. Or want to say, why would somebody want a spreadsheet? They don’t really get it. A lot of people don’t get it.

And, we wanted the spreadsheet so you can do stuff like this, which is, this is a map of schools interested in becoming Academies by area. And so because we have that raw data in spreadsheet form we can work out how many in the area. You can see suddenly that this part of England, Kent, has 99 schools, which is the biggest in the country. And only one area, which is Barking, up here, in London, which is, sorry, is down here in London, but anyway that has no schools applying at all.

And the government also always said that at the beginning that it would mainly be schools which weren’t “outstanding” would apply. But actually if you look at the figures, which again, we can do, the majority of them are outstanding schools. So they’re already schools which are good, which are applying to become academies. Which kind of isn’t the point. But that kind of analysis, that’s data journalism in a sense. It’s using the numbers to get a story, and to tell a story.

JS: And how long did that story take you to put together? To get the numbers, and do the graphics, and…?

SR: Well, I was helped a bit, because I got, I’ve had one of my helpers who works in editorial support to get the data onto a spreadsheet. And in terms of creating the graphic we have a fantastic tool here, which is set up by one of our technical development team who are over there, and what it does, is it allows you to paste a load of data, geographic data, into this box, and you tell it what kind, is it parliamentary constituency, or local authority, or educational authority, or whatever, however the different regional differentiations we have in the UK, and it will draw a map for you. So this map here was drawn by computer, basically, and then one of the graphics guys help sort out the labels and finesse it and make it look beautiful. But it saves you the hard work of coloring up all those things. So actually that took me maybe a couple of hours. In total.

JS: How about getting the data, how long did that take?

SR: Oh well luckily that data– you know the government makes the data available. But like I say, as a PDF file. So this is the government site, and that’s the list there, and you open it, it opens as a PDF. Because we’ll link to that.

But luckily the guys in the ESD [editorial services department] are very adept now, because of the Data Blog, at getting data into spreadsheets. So, you know they can do that in 20 minutes.

JS: So how many people are working on data overall, then?

SR: Well, in terms of– it’s my full time job to do it. I’m lucky in that I’ve got an awful lot of people around here who have got an interest who I can kind of go and nudge, and ask. It’s a very informal basis, and we’re looking to formalize that, at the moment. We’re working on a whole data strategy, and where it goes. So we’re hoping to kind of make all of these arrangements a bit more formal. But at the moment I have to fit into what other people are doing. But yeah, we’ve got a good team now that can help, and that’s really a unique thing.

So I was going through the Data Blog for you. So this is a typical, a weird day, so schools, and then we’ve got another schools thing because it’s a big schools day today. This is school building projects scrapped by constituency, full list. Now, this is another where the government didn’t make the data easily available. The department for education published a list of all the school projects that were going to be stopped when the government cut the funding, some of which is going towards creating Academy schools, which is why this is a bit of an issue in the country at the moment. And we want to know by constituency how it was working. So which MPs were having the most school projects cut, in their constituency. And we couldn’t get that list out of the department of education, but one MP had lodged it with the House of Commons library. So we managed to get it from the House of Commons library. But it didn’t come in a good form, it came in a PDF again, so again we had to get someone from tech to sort it out for us.

But the great thing is that we can do something like this, which is a map of projects stopped by constituency, by MP. And most of the projects we’ve stopped were in Labour seats. As you know Labour are not in power at the moment. So we can do some of this sort of analysis which is great. So there were 418 projects stopped in Labour constituent seats, and 268 stopped in conservative seats. So basically 40% of Labour MPs had a project stopped, at least one project stopped in their seat, compared to only 27% of Conservatives, and 24% of the Dems who are in power at the moment.

JS: So would it be accurate to say the data drove this story, or showed this story, or…?

SR: Data showed this story, which is great, but the one thing, the caveat — of course, the raw numbers are never 100% — the caveat was there were more projects going on in Labour areas because Labour government, previous government which is Labour set up the projects, and they gave more projects to Labour areas. So you can read it either way.

JS: And you said this in the story?

SR: We said this in the story. Absolutely. We always try and make the caveats available for people. So that’s a big story today, because of there are demonstrations about it in London. You’ve come to us on a very education-centered day today.

But there’s other stuff on the blog too. This is a very British thing. We did this because we thought it would be an interesting project to do. I had somebody in for a week and they didn’t have much to do so I got them to make a list of every Doctor Who villain ever.

JS: This was an intern project?

SR: This was an intern project. We kinda thought, yeah, we’ll get a bit of traffic. And we’ve never had so much involvement in a single piece ever. It’s had 500 retweets, and when you think most pieces will get 30 or 40, it’s kind of interesting. The traffic has been through the roof. And the great thing is, so we created–

JS: Ooh, what’s this? This is good.

SR: It’s quite an easy– we use ManyEyes quite a lot, which is very very quick to create lovely little graphics. And this is every single Doctor Who villain since the start of the program, and how many times they appear. So you see the Daleks lead the way in Doctor Who.

JS: Yeah, absolutely.

SR: Followed by the Cybermen, and the Masters in there a lot. And there are lots of other little things. But we started off with about 106 villains in total, and now we’re up to– we put it out there and we said to people, we know this isn’t going to be the complete list, can you help us? And now we’ve got 212. So my weekend has basically been– I’ll show you the data sheet, it’s amazing. You can see the comments are incredible. You see these kinds of things, “so what about the Sea Devils? The Zygons?” and so on.

And I’ll show you the data set, because it’s quite interesting. So this is the data set. Again Google docs. And you can see over here on the right hand side, this is how many people looking at it at any one time. So at that moment there are 11 people looking on. There could be 40 or 50 people looking at any one moment. And they’re looking and they’re helping us make corrections.

JS: So, wait– this data set is editable?

SR: No, we haven’t made it editable, because we’ve had a bad experience people coming to editable ones and mucking around, you know, putting swear words on stuff.

JS: So how do they help you?

SR: Well they’ll put stuff in the comments field and I’ll go in and put it on the spreadsheet. Because I want a sheet that people can still download. So now we’ve got, we’re now up to 203. We’ve doubled the amount of villains thanks to our readers. It’s Doctor Who. And it just shows we’re an eclectic– we’re a broad church on the Data Blog. Everything can be data. And that’s data. We’ve got number of appearances per villain, and it’s a program that people really care about. And it’s about as British as it’s possible to get. But then we also have other stuff too– and there we go, crashed again.

JS: Well let me just ask you a few questions, and take this opportunity to ask you some broader questions. Because we can do this all day. And I have. I’ve spent hours on your data blog because I’m a data geek. But let’s sort of bring it to some general questions here.

SR: Okay. Go for it.

JS: So first of all, I notice you have the Data Blog, you also have the world data index.

SR: Yes. Now the idea of that was that, obviously lots of governments around the world have started to open up their data. And around the time that the British government was– a lot of developers here were involved in that project — we started to think, what can we do around this that would help people, because suddenly we’ve got lots of sites out there that are offering open government data. And we thought, what if we could just gather them all together into one place. So you’ve got a single search engine. And that’s how we set up the world data search. Sorry to point you at the screen again.

JS: No that’s fine, that’s fine.

SR: Basically, so what we did, we started off with just Australia, New Zealand, UK and America. And basically what this site does, is it searches all of these open government data sites. Now we’ve got Australia, Toronto in Canada, New Zealand, the UK, London, California, San Francisco, and data.gov.

So say you search for “crime,” say you’re interested in crime. There you go. So you come back here, you see you’ve got results here from the UK, London, you’ve got results from data.gov in America, San Francisco, New Zealand and Australia. Say you’re interested in just seeing– you live in San Francisco and you’re only interested in San Francisco results. You’ve three results. And there you go, you click on that.

And you’re still within the Guardian site because what we’re asking people to do is help us rank the data, and submit visualizations and applications. So we want people to tell us what they’ve done with the data.

But anyway if you go and click on that, and you click on “download,” and it will start downloading the data for you. Or, what it will do is take you to the terms and conditions. We don’t bypass any T&Cs. The T&C’s come alongside. But you click on that, you agree to that, and then you get the data. So we really try and make it easy for people. There you go. And this is the crime incidence data. Very variable. This is great because it’s KML files, so if you wanted to visualize that you get really great information. It’s all sorts of stuff. Sometimes it’s CSVs.

JS: What’s a KML file?

SR: So, Google Earth.

JS: Okay.

SR: Sorry. So, it’s mapping, a mapping file straight away.

SR: Okay, so one of the things we ask people to do is to submit visualizations and applications they’ve produced. So for instance, London has some very very good open data. If you haven’t looked around the Data Store, it’s really worth going to. And one of these things they do is they provide a live feed of all the London traffic cameras. You can watch them live. And this is a lovely thing, because what somebody’s done is they’ve written an iPad application. So you can watch live TFL, Transport for London, traffic cameras on your iPad.

And you see that data set has been rated. A couple of people have gone in there and rated it. You’ve got a download button, the download is XML. So we try and help people around this data. And this is growing now. Every time somebody launches an open government data site we’re gonna put it on here, and we’re working on a few more at the moment. So we want it to be the place that people go to. Every time you Google “world government data” it pops up at the top, which is what you want. You want people who are just trying to compare different countries and don’t know where to start, to help them find a way through this maze of information that’s out there.

JS: So do you intend to do this for every country in the world?

SR: Every country in the world that launches an open government data site, we’ll whack it on here. And we’re working– at the moment there are about 20 decent open government data sites around the world. We’re picking those up. We’ve got on here now, how many have we got? One, two, three, four, five, six, seven, eight. We’ll have 20 on in the next couple of weeks. We’re really working through them at the moment.

And what this does is, it scrapes them. So basically, we don’t– for us it’s easy to manage because we don’t have to update these data sets all the time. The computer does that for us. But basically, what we do provide people with is context and background information, because you’re part of the data site there.

JS: So let me make sure I have this clear. So you’re not sucking down the actual data, you’re sucking down the list and descriptions of the data sets available?

SR: Absolutely. So we’re providing people, because basically we want it to be as updated as possible. We don’t– if we just uploaded onto our site, that would kind of be pointless, and it would mean it would be out of date. This way, if something pops up on data.gov and stays there, we’ll get it quick on here. We’ll help people find it. Helping people find the data, that’s our mission here. It’s not just generating traffic, it’s to help people find the information, because we want people to come to us when they’re looking for data.

JS: So, okay. You’ve talked about, it sounds like, two different projects. The Data Blog. where you collect and clean up and present data that you–

SR: That we find interesting. We’re selective.

JS: In the process of the Guardian’s newsgathering.

SR: Yeah, and just things that are interesting anyway. So the Doctor Who post that we were just looking at is just interesting to do. It’s not anything we’re going to do a story about. And often they’ll be things that are in the news, say that day, and I’ll think “oh that’s a good thing to put on the Data Blog.” So it could be crime figures, or it could be– and sometimes, the side effect of that is a great side effect because you end up with a piece in the paper, or a piece on the web site. But often it might be the Data Blog is the only place to get that information.

JS: And you index world government data sites.

SR: Yeah, absolutely.

JS: Does the Guardian do anything else with data?

SR: Yeah, well what we do is, we’re doing a lot of Guardian research with data. So what we want to do is give people a kind of way into that. So for instance, we do do a lot of data-based projects. So for instance we’re doing an executive pay survey of all the biggest companies, how much they pay their bosses and their chief executives. That has always been a thing the paper’s always done for stories. And now what we’ll do is we’ll make that stuff available– that data available for people. So instead of just raw data journalism, it’s quite old data journalism. We’ve been doing it for ten years. But we used to just call it a survey. Now it’s data journalism, because it’s getting stories out of numbers. So we’ll work with that, and we’ll publish that information for people to see. And there are a couple of big projects coming up this week, which I really can’t tell you about, but next week it will be obvious what they are.

JS: Probably by the time this goes up we’ll be able to link to them.

[Simon was referring to the Guardian's data journalism work on the leaked Afghanistan war logs, described in a thorough post on the Data Blog.]

SR: Yeah, I’ll mail you about them. But we’ve got now an area of expertise. So increasingly what I’m finding is that I’m getting people coming to me within The Guardian, saying, so we’ve got this spreadsheet, well how can I do this? So for instance that Academies thing we were just looking at, we were really keen to find out which areas were the most, where the most schools were, for the paper. The correspondent wanted to know that. So actually, because we’ve got this area of expertise now in managing data, we’re becoming kind of a go-to place within The Guardian, for journalists who are just writing stories where they need to know something, or they need to find some information out, which is an interesting side effect. Because it used to be that journalists were kind of scared of numbers, and scared of data. I really think that was the case. And now, increasingly, they’re trying to embrace that, and starting to realize you can get stories out of it.

JS: Well that’s really interesting. Let’s talk for a minute about how this applies to other newsrooms, because it’s– as you say, journalists have been traditionally scared of data.

SR: Yeah, absolutely. You could say they prided themselves, in this country anyway, they prided themselves on lack of mathematical ability. I would say.

JS: Which seems unfortunate in this era.

SR: Yeah, absolutely. Yeah, yeah, absolutely.

JS: But especially a lot of our readers are from smaller newsrooms, and so what kind of technical capability do you need to start tracking data, and publishing data sets?

SR: I think it’s really minimal. I mean, the thing is that actually, what we’re doing is really working with a basic, most of the time just basic spreadsheet packages. Excel or whatever you’ve got. Excel is easy to use, but it could be any package really. And we’re using Google spreadsheets, which again is widely available for people to do information. We’re using visualization tools which are again, ManyEyes or Timetric which are widely available and easy to use. I think what we’re doing is just bringing it together.

I think traditionally that journalists wouldn’t regard data journalism as journalism. It was research. Or, you know, how is publishing data– is that journalism? But I think now, what is happening is that actually, what used to happen is that we were the kind of gatekeepers to this information. We would keep it to ourselves. So we didn’t want our rivals to get ahold of it, and give them stories. We’d be giving stories away. And we wouldn’t believe that people out there in the world would have any contribution to make towards that. Now, that’s all changed now. I think now we’ve realized that actually, we’re not always the experts. Be it Doctor Who or Academy schools, there’s somebody out there who knows a lot more than you do, and can thus contribute. So you can get stories back from them, in a way. So we’re receiving the information much more.

JS: So you publish the data, and then other people build stories out of it, is that what you’re saying?

SR: Other people will let us know– well, we publish say, well that’s an interesting story, or this is a good visualization. We’ve published data for other people to visualize. We thought, that’s quite an interesting thing to mash it up with, we should do that ourselves. So there’s that thing, and there’s also the fact that if you put the information out there, you always get a return. You get people coming back.

So for instance the Academies thing today that we were talking about. We’ve had people come back saying, well I live in Derbyshire and I know that those schools are in quite wealthy areas. So we start to think, well is there a trend towards schools in wealthy areas going to this, and schools in poorer areas not going to this.

So it gives you extra stories or extra angles on stories you wouldn’t think of. And I think that’s part of it. And I think partly there’s just the realization that just publishing data in itself, because it’s interesting, is a journalistic enterprise. Because I think you have to apply journalistic treatment to that data. You have to choose the data in a selective, editorial fashion. And I think you have to process it in a way that makes it easy for people to use, and useful to people.

JS: So last question here, which is of course going to be on many editors’ and publishers’ minds.

SR: Sure.

JS: Let’s talk about traffic and money. How does this contribute to the business of The Guardian?

SR: Okay, it’s a new– it’s an experiment for us, but traffic-wise it’s been pretty healthy. We’ve had– during the election we were getting a million page impressions in a month. Which is not bad. On the Data Blog. Now, as a whole, out of the 36 million that The Guardian gets, it doesn’t seem like a lot. But actually, in the firmament of Guardian web sites that’s not bad. That’s kind of upper tier. And this is only after being around for a year.

So in terms of what it gives us, it gives the same as producing anything that produces traffic gives us. It’s good for the brand, and it’s good for The Guardian site. In the long run, I think that there is probably canny money to be made out of there, for organizations that can manage and interpret data. I don’t know exactly how, but I think we’d have to be pretty dumb if we don’t come up with something. I’d be very surprised. It’s an area where there’s such a lot of potential. There are people who don’t really know how to manage data and don’t really know how to organize data that– for us to get involved in that area. I really think that.

But also I think that just journalistically, it’s as important to do this as it is to write a piece about a fashion week or anything else we might employ a journalist to do. And in a way it’s more important, because if The Guardian is about open information, which– since the beginning of The Guardian we’ve campaigned for freedom of information and access to information, and this is the ultimate expression of that.

And we, on the site, we use the phrase “facts are sacred.” And this comes from the famous C. P. Scott who said that “comment is free,” which as you know is the name of our comment site, but “facts are sacred” was the second part of the saying. And I kinda think that is– you can see it on the comment site, there you go. “Comment is free, but facts are sacred.” And that’s what The Guardian’s about. I really think that, you know, this says a lot about the web. Interestingly, I think that’s how the web is changing, in the sense that a few years ago it was just about comment. People wanted to say what they thought. Now I think it’s, increasingly, people want to find out what the facts are.

JS: All right, well, thank you very much for a thorough introduction to The Guardian’s data work.

SR: Thanks a lot.

January 07 2010

19:11

Keeping Martin honest: Checking on Langeveld’s predictions for 2009

[A little over one year ago, our friend Martin Langeveld made a series of predictions about what 2009 would bring for the news business — in particular the newspaper business. I even wrote about them at the time and offered up a few counter-predictions. Here's Martin's rundown of how he fared. Up next, we'll post his predictions for 2010. —Josh]

PREDICTION: No other newspaper companies will file for bankruptcy.

WRONG. By the end of 2008, only Tribune had declared. Since then, the Star-Tribune, the Chicago Sun-Times, Journal Register Company, and the Philadelphia newspapers made trips to the courthouse, most of them right after the first of the year.

PREDICTION: Several cities, besides Denver, that today still have multiple daily newspapers will become single-newspaper towns.

RIGHT: Hearst closed the Seattle Post-Intelligencer (in print, at least), Gannett closed the Tucson Citizen, making those cities one-paper towns. In February, Clarity Media Group closed the Baltimore Examiner, a free daily, leaving the field to the Sun. And Freedom is closing the East Valley Tribune in Mesa, which cuts out a nearby competitor in the Phoenix metro area.

PREDICTION: Whatever gets announced by the Detroit Newspaper Partnership in terms of frequency reduction will be emulated in several more cities (including both single and multiple newspaper markets) within the first half of the year.

WRONG: Nothing similar to the Detroit arrangement has been tried elsewhere.

PREDICTION: Even if both papers in Detroit somehow maintain a seven-day schedule, we’ll see several other major cities and a dozen or more smaller markets cut back from six or seven days to one to four days per week.

WRONG, mostly: We did see a few other outright closings including the Ann Arbor News (with a replacement paper published twice a week), and some eliminations of one or two publishing days. But only the Register-Pajaronian of Watsonville, Calif. announced it will go from six days to three, back in January.

PREDICTION: As part of that shift, some major dailies will switch their Sunday package fully to Saturday and drop Sunday publication entirely. They will see this step as saving production cost, increasing sales via longer shelf life in stores, improving results for advertisers, and driving more weekend website traffic. The “weekend edition” will be more feature-y, less news-y.

WRONG: This really falls in the department of wishful thinking; it’s a strategy I’ve been advocating for the last year or so to follow the audience to the web, jettison the overhead of printing and delivery, but retain the most profitable portion of the print product.

PREDICTION: There will be at least one, and probably several, mergers between some of the top newspaper chains in the country. Top candidate: Media News merges with Hearst. Dow Jones will finally shed Ottaway in a deal engineered by Boston Herald owner (and recently-appointed Ottaway chief) Pat Purcell.

WRONG AGAIN, but this one is going back into the 2010 hopper. Lack of capital by most of the players, and the perception or hope that values may improve, put a big damper on mergers and acquisitions, but there should be renewed interest ahead.

PREDICTION: Google will not buy the New York Times Co., or any other media property. Google is smart enough to stick with its business, which is organizing information, not generating content. On the other hand, Amazon may decide that they are in the content business…And then there’s the long shot possibility that Michael Bloomberg loses his re-election bid next fall, which might generate a 2010 prediction, if NYT is still independent at that point.

RIGHT about Google, and NOT APPLICABLE about Bloomberg (but Bloomberg did acquire BusinessWeek). The Google-NYT pipe dream still gets mentioned on occasion, but it won’t happen.

PREDICTION: There will be a mini-dotcom bust, featuring closings or fire sales of numerous web enterprises launched on the model of “generate traffic now, monetize later.”

WRONG, at least on the mini-bust scenario. Certainly there were closings of various digital enterprises, but it didn’t look like a tidal wave.

PREDICTION: The fifty newspaper execs who gathered at API’s November Summit for an Industry in Crisis will not bother to reconvene six months later (which would be April) as they agreed to do.

RIGHT. There was a very low-key round two with fewer participants in January, without any announced outcomes, and that was it. [Although there was also the May summit in Chicago, which featured many of the same players. —Ed.]

PREDICTION: Newspaper advertising revenue will decline year-over-year 10 percent in the first quarter and 5 percent in the second. It will stabilize, or nearly so, in the second half, but will have a loss for the year. For the year, newspapers will slip below 12 percent of total advertising revenue (from 15 percent in 2007 and around 13.5 percent in 2008). But online advertising at newspaper sites will resume strong upward growth.

WRONG, and way too optimistic. Full-year results won’t be known for months, but the first three quarters have seen losses in the 30 percent ballpark. Gannett and New York Times have suggested Q4 will come in “better” at “only” about 25 percent down. My 12 percent reference was to newspaper share of the total ad market, a metric that has become harder to track this year due to changes in methodology at McCann, but the actual for 2009 ultimately will sugar out at about 10 percent.

PREDICTION: Newspaper circulation, aggregated, will be steady (up or down no more than 1 percent) in each of the 6-month ABC reporting periods ending March 31 and September 30. Losses in print circulation will be offset by gains in ABC-countable paid digital subscriptions, including facsimile editions and e-reader editions.

WRONG, and also way too optimistic. The March period drop was 7.1 percent, the September drop was 10.6 percent, and digital subscription didn’t have much impact.

PREDICTION: At least 25 daily newspapers will close outright. This includes the Rocky Mountain News, and it will include other papers in multi-newspaper markets. But most closings will be in smaller markets.

WRONG, and too pessimistic. About half a dozen daily papers closed for good during the year.

PREDICTION: One hundred or more independent local startup sites focused on local news will be launched. A number of them will launch weekly newspapers, as well, repurposing the content they’ve already published online. Some of these enterprises are for-profit, some are nonprofit. There will be some steps toward formation of a national association of local online news publishers, perhaps initiated by one of the journalism schools.

Hard to tell, but probably RIGHT. Nobody is really keeping track of how many hyperlocals are active, or their comings and goings. An authoritative central database would be a Good Thing.

PREDICTION: The Dow Industrials will be up 15 percent for the year. The stocks of newspaper firms will beat the market.

RIGHT. The Dow finished the year up 18.8 percent. (This prediction is the one that got the most “you must be dreaming” reactions last year.

And RIGHT about newspapers beating the market (as measured by the Dow Industrials), which got even bigger laughs from the skeptics. There is no index of newspaper stocks, but on the whole, they’ve done well. It helps to have started in the sub-basement at year-end 2008, of course, which was the basis of my prediction. Among those beating the Dow, based on numbers gathered by Poynter’s Rick Edmonds, were New York Times (+69%), AH Belo (+164%), Lee Enterprises (+746%), McClatchy (+343%), Journal Communications (+59%), EW Scripps (+215%), Media General (+348%), and Gannett (+86%). Only Washington Post Co. (+13%) lagged the market. Not listed, of course, are those still in bankruptcy.

PREDICTION: At least one publicly-owned newspaper chain will go private.

NOPE.

PREDICTION: A survey will show that the median age of people reading a printed newspaper at least 5 days per week is is now over 60.

UNKNOWN: I’m not aware of a 2009 survey of this metric, but I’ll wager that the median age figure is correct.

PREDICTION: Reading news on a Kindle or other e-reader will grow by leaps and bounds. E-readers will be the hot gadget of the year. The New York Times, which currently has over 10,000 subscribers on Kindle, will push that number to 75,000. The Times will report that 75 percent of these subscribers were not previously readers of the print edition, and half of them are under 40. The Wall Street Journal and Washington Post will not be far behind in e-reader subscriptions.

UNKNOWN, as far as the subscription counts go: newspapers and Kindle have not announced e-reader subscription levels during the year. The Times now has at least 30,000, as does the Wall Street Journal (according to a post by Staci Kramer in November; see my comment there as well). There have been a number of new e-reader introductions, but none of them look much better than their predecessors as news readers. My guess would be that by year end, the Times will have closer to 40,000 Kindle readers and the Journal 35,000. During 2010, 75,000 should be attainable for the Times, especially counting all e-editions (which include the Times Reader and 53,353 weekdays and 34,435 Sundays for the six months ending Sept. 30.

PREDICTION: The advent of a color Kindle (or other brand color e-reader) will be rumored in November 2009, but won’t be introduced before the end of the year.

RIGHT: plenty of rumors, but no color e-reader, except Fujitsu’s Flepia, which is expensive, experimental, and only for sale in Japan.

PREDICTION: Some newspaper companies will buy or launch news aggregation sites. Others will find ways to collaborate with aggregators.

RIGHT: Hearst launched its topic pages site LMK.com. And various companies are working with EVRI, Daylife and others to bring aggregated feeds to their sites.

PREDICTION: As newsrooms, with or without corporate direction, begin to truly embrace an online-first culture, outbound links embedded in news copy, blog-style, as well as standalone outbound linking, will proliferate on newspaper sites. A reporter without an active blog will start to be seen as a dinosaur.

MORE WISHFUL THINKING, although there’s progress. Many reporters still don’t blog, still don’t tweet, and many papers are still on content management systems that inhibit embedded links.

PREDICTION: The Reuters-Politico deal will inspire other networking arrangements whereby one content generator shares content with others, in return for right to place ads on the participating web sites on a revenue-sharing basis.

YES, we’re seeing more sharing of content, with various financial arrangements.

PREDICTION: The Obama administration will launch a White House wiki to help citizens follow the Changes, and in time will add staff blogs, public commenting, and other public interaction.

NOT SO FAR, although a new Open Government Initiative was recently announced by the White House. This grew out of some wiki-like public input earlier in the year.

PREDICTION: The Washington Post will launch a news wiki with pages on current news topics that will be updated with new developments.

YES — kicked off in January, it’s called WhoRunsGov.com.

PREDICTION: The New York Times will launch a sophisticated new Facebook application built around news content. The basic idea will be that the content of the news (and advertising) package you get by being a Times fan on Facebook will be influenced by the interests and social connections you have established on Facebook. There will be discussion of, if not experimentation with, applying a personal CPM based on social connections, which could result in a rewards system for participating individuals.

NO. Although the Times has continued to come out with innovative online experiments, this was not one of them.

PREDICTION: Craigslist will partner with a newspaper consortium in a project to generate and deliver classified advertising. There will be no new revenue in the model, but the goal will be to get more people to go to newspaper web sites to find classified ads. There will be talk of expanding this collaboration to include eBay.

NO. This still seems like a good idea, but probably it should have happened in 2006 and the opportunity has passed.

PREDICTION: Look for some big deals among the social networks. In particular, Twitter will begin to falter as it proves to be unable to identify a clearly attainable revenue stream. By year-end, it will either be acquired or will be seeking to merge or be acquired. The most likely buyer remains Facebook, but interest will come from others as well and Twitter will work hard to generate an auction that produces a high valuation for the company.

NO DEAL, so far. But RIGHT about Twitter beginning to falter and still having no “clearly attainable” revenue stream in sight. Twitter’s unique visitors and site visits, as measured by Compete.com, peaked last summer and have been declining, slowly, ever since. Quantcast agrees. [But note that neither of those traffic stats count people interacting with Twitter via the API, through Twitter apps, or by texting. —Ed.]

PREDICTION: Some innovative new approaches to journalism will emanate from Cedar Rapids, Iowa.

YES, as described in this post and this post. See also the blogs of Steve Buttry and Chuck Peters. The Cedar Rapids Gazette and its affiliated TV station and web site are in the process of reinventing and reconstructing their entire workflow for news gathering and distribution.

PREDICTION: A major motion picture or HBO series featuring a journalism theme (perhaps a blogger involved in saving the world from nefarious schemes) will generate renewed interest in journalism as a career.

RIGHT. Well, I’m not sure if it has generated renewed interest in journalism as a career, but the movie State of Play featured both print reporters and bloggers. And Julie of Julie & Julia was a blogger, as well. [Bit of a reach there, Martin. —Ed.]

[ADDENDUM: I posted about Martin's predictions when he made them and wrote this:

I’d agree with most, although (a) I think there will be at least one other newspaper company bankruptcy, (b) I think Q3/Q4 revenue numbers will be down from 2008, not flat, (c) circ will be down, not stable, (d) newspaper stocks won’t beat the market, (e) the Kindle boom won’t be as big as he thinks for newspapers, and (f) Twitter won’t be in major trouble in [2009] — Facebook is more likely to feel the pinch with its high server-farm costs.

I was right on (a), (b), and (c) and wrong on (d). Gimme half credit for (f), since Twitter is now profitable and Facebook didn’t seem too affected by server expenses. Uncertain on (e), but I’ll eat my hat if “75 percent of [NYT Kindle] subscribers were not previously readers of the print edition, and half of them are under 40.” —Josh]

Photo of fortune-teller postcard by Cheryl Hicks used under a Creative Commons license.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl