Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 11 2013

17:00

OpenData Latinoamérica: Driving the demand side of data and scraping towards transparency

“There’s a saying here, and I’ll translate, because it’s very much how we work,” Miguel Paz said to me over a Skype call from Chile. “But that doesn’t mean that it’s illegal. Here, it’s ‘It’s better to ask forgiveness than to ask permission.””

Paz is a veteran of the digital news business. The saying has to do with his approach to scraping public data from governments that may be slow to share it. He’s also a Knight International Journalism Fellow, the founder of Hacks/Hackers Chile, and a recent Knight News Challenge winner. A few years ago, he founded Poderopedia, a database of Chilean politicians and their many connections to political organizations, government offices, and businesses.

But freeing, organizing, and publishing data in Chile alone is not enough for Paz, which is why his next project, in partnership with Mariano Blejman of Argentina’s Hacks/Hackers network, is aimed at freeing data from across Latin America. Their project is called OpenData Latinoamérica. Paz and Blejman hope to build a centralized home where all regional public data can be stored and shared.

Their mutual connection through Hacks/Hackers is key to the development of OpenData Latinoamérica. The network will make itself, to whatever extent possible, available for trouble shooting and training as the project gets off the ground and civic hackers and media types learn both how to upload data sets as well as make use of the information they find there.

Another key partnership helping make OpenData Latinoamérica possible is with the World Bank Institute’s Global Media Development program, which is run by Craig Hammer. Hammer believes the data age is revolutionizing government, non-government social projects, and how we make decisions about everyday life.

“The question for us, is, What are we gonna do with the data? Data for what? Bridging that space between opening the data and how it translates into improving the quality of people’s lives around the world requires a lot of time and attention,” he says. “That’s really where the World Bank Institute and our programmatic work is focused.”

A model across the Atlantic

Under Hammer, the World Bank helped organize and fund Africa Open Data, a similar project launched by another Knight fellow, Justin Arenstein. “The bank’s own access-to-information policy provides for a really robust opportunity to open its own data,” Hammer says, “and in so doing, provide support to countries across regions to open their own data.”

Africa Open Data is still in beta, but bringing together hackers, journalists, and information in training bootcamps has already led to reform-producing journalism. In a post about the importance of equipping the public for the data age, Hammer tells the story of Irene Choge, a journalist from Kenya who attended a training session hosted by the World Bank in conjunction with Africa Open Data.

She…examined county-level expenditures on education infrastructure — specifically, on the number of toilets per primary school…Funding allocated for children’s toilet facilities had disappeared, resulting in high levels of open defecation (in the same spaces where they played and ate). This increased their risk of contracting cholera, giardiasis, hepatitis, and rotavirus, and accounted for low attendance, in particular among girls, who also had no facilities during their menstruation cycles. The end result: poor student performance on exams…Through Choge’s analysis and story, open data became actionable intelligence. As a result, government is acting: ministry resources are being allocated to correct the toilet deficiency across the most underserved primary schools and to identify the source of the misallocation at the root of the problem.

Hammer calls Africa Open Data a useful “stress test” for OpenData Latinoamérica, but Paz says the database was also a natural next step in a series of frustrations he and Blejman had encountered in their other work.

“Usually, the problem you have is: Everything is cool before the hackathon, and during the hackathon,” says Paz. “But after, it’s like, who are the people who are working on the project? What’s the status of the project? Can I follow the project? Can I be a part of the project?” The solution to this problem ended up being Hackdash, which was actually Blejman’s brainchild — an interface that helps hackers keep abreast of the answers to those questions and thereby shore up the legacy of various projects.

So thinking about ways that international hackers can organize and communicate across the region is nothing new to Paz and Blejman. “One hackathon, we would do something, and another person who didn’t know about that would do something else. So when we saw the Open Data Africa platform, we thought it was a really great idea to do in Latin America,” he says.

Blejman says the contributions of the World Bank have been essential to the founding of OpenData Latinoamérica, especially in organizing the data bootcamps. Hammer says he sees the role of the bank as building a bridge between civic hackers and media. “More than a platform,” he says it’s, “an institution in and of itself to help connect sources of information to government and help transform that data into knowledge and that knowledge into action.”

Giving people the tools to understand the power of data is an important tenet of Hammer’s open data philosophy. He believes the next step for Big Data is global data literacy, which he says is most immediately important for “very specific and arguably strategic public constituencies — journalists, media, civic hackers, and civil society.” Getting institutions, like newspapers, to embrace the importance of data literacy rather than relying on individual interest is just one goal Hammer has in mind.

“I’m not talking about data visualization skills for planet Earth,” he says. “I’m saying, it’s possible — or it should be possible — for anybody that wants to have these skills to have them. If we’re talking about data as the real democratizer — open data as meaningful democratization of information — then it has to be digestible and accessible and consumable by everyone and everybody who wants to access and digest and consume it.”

Increasing the desire of the public for more, freer data is what Hammer calls stoking the demand side. He says it’s great if governments are willingly making information accessible, but for it to be useful, people have to understand its power and seek to unleash it.

“What’s great about OpenData Latinoamérica is it’s in every way a demand-side initiative, where the public is liberating its own data — it’s scraping data, it’s cleaning it,” he says. “Open data is not solely the purview of the government. It’s something that can be inaugurated by public constituencies.”

For example, in Argentina, where the government came late to the open data game, Blejman says he saw a powerful demand for information spring up in hackers and journalists around him. When they saw what other neighboring countries had and what they could do with that information, they demanded the same, and Argentina’s government began to release some of that data.

“We need to think about open data as a service, because no matter how much advocacy from NGOs, people don’t care about ‘open data’” per se, Paz says. “They care about data because it affects their life, in a good or bad way.”

Another advantage Bleman and Paz had when heading into OpenData Latinoamérica was the existence of Junar, a Chilean software platform founded by Javier Pajaro, who was a frustrated analyst when he decided to embrace open data platforms and help others do the same. Blejman said that, while Africa Open Data opted for CKAN, using a local, Spanish-language company that was already familiar to members of the Hacks/Hackers network has strengthened the project, making it easier to troubleshoot problems as they arise. He also said Junar’s ability to give participating organizations more control fit nicely into their hands-off, crowd-managed vision for future day-to-day operation of the database.

Organizing efforts

Paz and Blejman have high hopes for the stories and growth that will come from OpenData Latinoamérica. “What we expect from these events is for people to start using data, encourage newspapers to organize around data themes, and have the central hub for what they want to consume,” Blejman said.

They hope to one day bring in data from every country in Latin America, but they acknowledge that some will be harder to reach than others. “Usually, the federated governments, it’s harder to get standardized data. So, in a country like Argentina, which is a federated state with different authorities on different levels, it’s harder to get standardized data than in a republic where there’s one state and no federated government,” says Paz. “But then again, in Chile, we have a really great open data and open government and transparency allows, but we don’t have great data journalism.” (Chile is a republic.)

Down the road, they’d also like to provide a secure way for anonymous sources to dump data to the site. Paz says in his experience as a news editor, 20–25 percent of scoops come from anonymous tips. But despite developments like The New Yorker’s recent release of Strongbox, OpenData Latinoamérica is still working out a secure method that doesn’t require downloading Tor, but is more secure than email. Blejman also added that, for now, whatever oversight they have over the quality and accuracy of the original data they’re working with is minimal: “At the end, we cannot control the original sources, and we are just trusting the organizations.”

But more than anything, Paz is excited about seeing the beginnings of the stories they’ll be able to tell. He plans to use documents about public purchases made by Chile’s government to build an app that allows citizens to track what their government is spending money on, and what companies are being contracted those dollars.

Another budding story exemplifies the extent to which Paz has taken to heart Craig Hammer’s emphasis on building demand. In Chile, there is currently a significant outcry from students over the rising cost of education. Protests in favor of free education are ongoing. In response, Paz decided to harness this focus, energy, and frustration into a scrape-a-thon (or #scrapaton) to be held June 29 in Santiago. They will focus on scraping data on the owners of universities, companies that contract with universities, and who owns private and subsidized schools.

“There’s a joke that says if you put five gringos — and I don’t mean gringos in a disrespectful way — if you put five U.S. people in a room, they’re probably going to invent a rocket,” says Paz. “If you put five Chileans in a room, they’re probably going to fight each other. So one of the things — we’re not just building tools, we’re also building ways of working together, and making people trust each other.” Blejman added that he hopes the recent release of a Spanish-language version of the Open Data Handbook (El manual de Open Data) will further facilitate collaboration between hackers in various Latin American countries.

With a project of this size and scope, there are also some ambitious designs around measurement. Paz hopes to track how many stories and projects originate with datasets from OpenData Latinoamérica. Craig Hammer wants to quantify the social good of open data, a project he says is already underway via the World Wide Web Foundation’s collaboration with the Open Data for Development Camp.

“If there is a cognizable and evidentiary link between open data and boosting shared prosperity,” Hammer says, “then I think that would be, in many cases, the catalytic moment for open data, and would enable broad recognition of why it’s important and why it’s a worthwhile investment, and broad diffusion of data literacy would really explode.”

Hammer wants people to take ownership of data and realize it can help inform decisions at all levels, even for individuals and families. Once that advantage is made clear to the majority of the population, he says, the demand will kick in, and all kinds of organizations will feel pressured to share their information.

“There’s this visceral sense that data is important, and that it’s good. There’s recognition that opening information and making it broadly accessible is in and of itself a global public good. But it doesn’t stop there, right? That’s not the end,” he says. “That’s the beginning.”

Photo of Santiago student protesters walking as police fire water canons and tear gas fills the air, Aug. 8, 2012 by AP/Luis Hidalgo.

February 10 2012

18:00

Still shaping the way people think about news innovation? A few reflections on the new KNC 2.0

As someone who probably has spent more time thinking about the Knight News Challenge than anyone outside of Knight Foundation headquarters — doing a dissertation on the subject will do that to you! — I can’t help but follow its evolution, even after my major research ended in 2010. And evolve it has: from an initial focus on citizen journalism and bloggy kinds of initiatives (all the rage circa 2007, right?) to a later emphasis on business models, visualizations, and data-focused projects (like this one) — among a whole host of other projects including news games, SMS tools for the developing world, crowdsourcing applications, and more.

Now, after five years and $27 million in its first incarnation, Knight News Challenge 2.0 has been announced for 2012, emphasizing speed and agility (three contests a year, eight-week turnarounds on entries) and a new topical focus (the first round is focused on leveraging existing networks). While more information will be coming ahead of the February 27 launch, here are three questions to chew on now.

Does the Knight News Challenge still dominate this space?

The short answer is yes (and I’m not just saying that because, full disclosure, the Knight Foundation is a financial supporter of the Lab). As I’ve argued before, in the news innovation scene, at this crossroads of journalism and technology communities, the KNC has served an agenda-setting kind of function — perhaps not telling news hipsters what to think regarding the future of journalism, but rather telling them what to think about. So while folks might disagree on the Next Big Thing for News, there’s little question that the KNC has helped to shape the substance and culture of the debate and the parameters in which it occurs.

Some evidence for this comes from the contest itself: Whatever theme/trend got funded one year would trigger a wave of repetitive proposals the next. (As Knight said yesterday: “Our concern is that once we describe what we think we might see, we receive proposals crafted to meet our preconception.”)

And yet the longer answer to this question is slightly more nuanced. When the KNC began in 2006, with the first winners named in 2007, it truly was the only game in town — a forum for showing “what news innovation looks like” unlike any other. Nowadays, a flourishing ecosystem of websites (ahem, like this one), aggregators (like MediaGazer), and social media platforms is making the storyline of journalism’s reboot all the more apparent. It’s easier than ever to track who’s trying what, which experiments are working, and so on — and seemingly in real time, as opposed to a once-a-year unveiling. Hence the Knight Foundation’s move to three quick-fire contests a year, “as we try to bring our work closer to Internet speed.”

How should we define the “news” in News Challenge?

One of the striking things I found in my research (discussed in a previous Lab post) was that Knight, in its overall emphasis, has pivoted away from focusing mostly on journalism professionalism (questions like “how do we train/educate better journalists?”) and moved toward a broader concern for “information.” This entails far less regard for who’s doing the creating, filtering, or distributing — rather, it’s more about ensuring that people are informed at the local community level. This shift from journalism to information, reflected in the Knight Foundation’s own transformation and its efforts to shape the field, can be seen, perhaps, like worrying less about doctors (the means) and more about public health (the ends) — even if this pursuit of health outcomes sometimes sidesteps doctors and traditional medicine along the way.

This is not to say that Knight doesn’t care about journalism. Not at all. It still pours millions upon millions of dollars into clearly “newsy” projects — including investigative reporting, the grist of shoe-leather journalism. Rather, this is about Knight trying to rejigger the boundaries of journalism: opening them up to let other fields, actors, and ideas inside.

So, how should you define “news” in your application? My suggestion: broadly.

What will be the defining ethos of KNC 2.0?

This is the big, open, and most interesting question to me. My research on the first two years of KNC 1.0, using a regression analysis, found that contest submissions emphasizing participation and distributed knowledge (like crowdsourcing) were more likely to advance, all things being equal. My followup interviews with KNC winners confirmed this widely shared desire for participation — a feeling that the news process not only could be shared with users, but in fact should be.

I called this an “ethic of participation,” a founding doctrine of news innovation that challenges journalism’s traditional norm of professional control. But perhaps, to some extent, that was a function of the times, during the roughly 2007-2010 heyday of citizen media, with the attendant buzz around user-generated content as the hot early-adopter thing in news — even if news organizations then, as now, struggled to reconcile and incorporate a participatory audience. Even while participation has become more mainstream in journalism, there are still frequent flare-ups, like this week’s flap over breaking news on Twitter, revealing enduring tensions at the “collision of two worlds — when a hierarchical media system in the hands of the few collides with a networked media system open to all,” as Alfred Hermida wrote.

So what about this time around? Perhaps KNC 2.0 will have an underlying emphasis on Big Data, algorithms, news apps, and other things bubbling up at the growing intersection of computer science and journalism. It’s true that Knight is already underwriting a significant push in this area through the (also just-revised) Knight-Mozilla OpenNews project (formerly called the Knight-Mozilla News Technology Partnership — which Nikki Usher and I have written about for the Lab). To what extent is there overlap or synergy here? OpenNews, for 2012, is trying to build on the burgeoning “community around code” in journalism — leveraging the momentum of Hacks/Hackers, NICAR, and ONA with hackfests, code-swapping, and online learning. KNC 2.0, meanwhile, talks about embracing The Hacker Way described by Mark Zuckerberg — but at the same time backs away a bit from its previous emphasis on open source as a prerequisite. It’ll be interesting to see how computational journalism — explained well in this forthcoming paper (PDF here) by Terry Flew et al. in Journalism Practice — figures into KNC 2.0.

Regardless, the Knight News Challenge is worth watching for what it reveals about the way people — journalists and technologists, organizations and individuals, everybody working in this space — talk about and make sense of “news innovation”: what it means, where it’s taking us, and why that matters for the future of journalism.

August 03 2011

14:00

Transparency, iteration, standards: Knight-Mozilla’s learning lab offers journalism lessons of open source

This spring, the Knight Foundation and Mozilla took the premise of hacks and hackers collaboration and pushed it a step further, creating a contest to encourage journalists, developers, programmers, and anyone else so inclined to put together ideas to innovate news.

Informally called “MoJo,” the Knight-Mozilla News Technology Partnership has been run as a challenge, the ultimate prize being a one-year paid fellowship in one of five news organizations: Al Jazeera English, the BBC, the Guardian, Boston.com, and Zeit Online.

We’ve been following the challenge from contest entries to its second phase, an online learning lab, where some 60 participants were selected on the basis of their proposal to take part in four weeks of intense lectures. At the end, they were required to pitch a software prototype designed to make news, well, better.

Through the learning lab, we heard from a super cast of web experts, like Chris Heilmann, one of the guys behind the HTML5 effort; Aza Raskin, the person responsible for Firefox’s tabbed browsing; and John Resig, who basically invented the jQuery JavaScript library; among other tech luminaries. (See the full lineup.)

There was a theme running through the talks: openness. Not only were the lectures meant to get participants thinking about how to make their projects well-designed and up to web standards, but they also generally stressed the importance of open-source code. (“News should be universally accessible across phones, tablets, and computers,” MoJo’s site explains. “It should be multilingual. It should be rich with audio, video, and elegant data visualization. It should enlighten, inform, and entertain people, and it should make them part of the story. All of that work will be open source, and available for others to use and build upon.”)

We also heard from journalists: Discussing the opportunities and challenges for technology and journalism were, among other luminaries, Evan Hansen, editor-in-chief of Wired.com; Amanda Cox, graphics editor of The New York Times; Shazna Nessa, director of interactive at the AP; Mohamed Nanabhay, head of new media at Al Jazeera English; and Jeff Jarvis.

In other words, over the four weeks of the learning lab’s lectures, we heard from a great group of some of the smartest journalists and programmers who are thinking about — and designing — the future of news. So, after all that, what can we begin to see about the common threads emerging between the open source movement and journalism? What can open source teach journalism? And journalism open source?

Finding 1:
* Open source is about transparency.
* Journalism has traditionally not been about transparency, instead keeping projects under wraps — the art of making the sausage and then keeping it stored inside newsrooms.

Because open-source software development often occurs among widely distributed and mostly volunteer participants who tinker with the code ad-hoc, transparency is a must. Discussion threads, version histories, bug-tracking tools, and task lists lay bare the process underlying the development — what’s been done, who’s done it, and what yet needs tweaking. There’s a basic assumption of openness and collaboration achieving a greater good.

Ergo: In a participatory news world, can we journalists be challenged by the ethics of open source to make the sausage-making more visible, even collaborative?

No one is advocating making investigative reporting an open book, but sharing how journalists work might be a start. As Hansen pointed out, journalists are already swimming in information overload from the data they gather in reporting; why not make some of that more accessible to others? And giving people greater space for commenting and offering correction when they think journalists have gone wrong — therein lies another opportunity for transparency.

Finding 2:
* Open source is iterative.
* Journalism is iterative, but news organizations generally aren’t (yet).

Software development moves quickly. Particularly in the open source realm, developers aren’t afraid to make mistakes and make those mistakes public as they work through the bugs in a perpetual beta mode rather than wait until ideas are perfected. The group dynamic means that participants feel free to share ideas and try new things, with a “freedom to fail” attitude that emphasizes freedom much more than failure. Failure, in fact, is embraced as a step forward, a bug identified, rather than backward. This cyclical process of iterative software development — continuous improvement based on rapid testing — stands in contrast to the waterfall method of slower, more centralized planning and deployment.

On the one hand, journalism has iterative elements, like breaking news. As work, journalism is designed for agility. But journalism within legacy news organizations is often much harder to change, and tends to be more “waterfall” in orientation: The bureaucracy and business models and organizational structures can take a long time to adapt. Trying new things, being willing to fail (a lot) along the way, and being more iterative in general are something we can learn from open-source software.

Finding 3:
* Open source is about standards.
* So is journalism.

We were surprised to find that, despite its emphasis on openness and collaboration, the wide world of open source is also a codified world with strict standards for implementation and use. Programming languages have documentation for how they are used, and there is generally consensus among developers about what looks good on the web and what makes for good code.

Journalism is also about standards, though of a different kind: shared values about newsgathering, news judgment, and ethics. But even while journalism tends to get done within hierarchical organizations and open-source development doesn’t, journalism and open source share essentially the same ideals about making things that serve the public interest. In one case, it’s programming; in the other case, it’s telling stories. But there’s increasingly overlap between those two goals, and a common purpose that tends to rise above mere profit motive in favor of a broader sense of public good.

However, when it comes to standards, a big difference between the the open-source movement and journalism is that journalists, across the board, aren’t generally cooperating to achieve common goals. While programmers might work together to make a programming language easier to use, news organizations tend to go at their own development in isolation from each other. For example, The Times went about building its pay meter fairly secretly: While in development, even those in the newsroom didn’t know the details about the meter’s structure. Adopting a more open-source attitude could teach journalists, within news organizations and across them, to think more collaboratively when it comes to solving common industry problems.

Finding 4:
* Open-source development is collaborative, free, and flexible.
* Producing news costs money, and open source may not get to the heart of journalism’s business problems.

Open-source software development is premised on the idea of coders working together, for free, without seeking to make a profit at the expense of someone else’s intellectual property. Bit by bit, this labor is rewarded by the creation of sophisticated programming languages, better-and-better software, and the like.

But there’s a problem: Journalism can’t run on an open source model alone. Open source doesn’t give journalism any guidance for how to harness a business model that pays for the news. Maybe open-source projects are the kind of work that will keep people engaged in the news, thus bulking up traditional forms of subsidy, such as ad revenue. (Or, as in the case of the “open R&D” approach of open APIs, news organizations might use openness techniques to find new revenue opportunities. Maybe.)

Then again, the business model question isn’t, specifically, the point. The goal of MoJo’s learning lab, and the innovation challenge it’s part of, is simply to make the news better technologically — by making it more user-friendly, more participatory, etc. It’s not about helping news organizations succeed financially. In all, the MoJo project has been more about what open source can teach journalism, not vice versa. And that’s not surprising, given that the MoJo ethos has been about using open technologies to help reboot the news — rather than the reverse.

But as the 60 learning lab participants hone their final projects this week, in hopes of being one of the 15 who will win a next-stage invite to a hackathon in Berlin, they have been encouraged to collaborate with each other to fill out their skill set — by, say, a hack partnering with a hacker, and so forth. From those collaborations may come ideas not only for reinventing online journalism, but also for contributing to the iteration of open-source work as a whole.

So keep an eye out: Those final projects are due on Friday.

January 20 2011

15:30

Boston Hack Day Challenge: An open door to Boston.com

Count The Boston Globe among the growing number of organizations that want hackers to come in from the cold. On the weekend of Feb. 25 they’re holding a three-day event called the “Boston Hack Day Challenge” where developers, designers, coders and anyone else inclined to make apps will gather to “create new online and mobile products that can make life better for Bostonians.”

We’ve got our share of tech heads around the area thanks to schools like MIT and Harvard, not to mention start-ups (perhaps you’ve heard of SCVNGR?), and the Globe is looking to capitalize on that to help promote their new digital test kitchen, Beta.Boston.com.

In the last few years a number of companies, in and outside of media, have dabbled in hackathons, sometimes to try and associate their name with innovation, other times to try and find the best new talent and products to cherrypick. The New York Times started the Times Open series a few years ago to get New York’s tech community tied into the newspaper and help nudge along the concept of the journo-programmer. We’ve also seen journalists, programmers and developers come together in crises like last year’s earthquakes in Haiti, to try and build tools to aid in communication and emergency response. (And I would be remiss if I didn’t mention the work of Hacks/Hackers, which has held a number of developer events like Hacks/Hackers Unite.)

At the Boston Hack Day Challenge, teams will use the weekend to build a site or app dedicated to alleviating one problem or another in the Boston area. (One example would be something like the OpenMBTA app, which I can vouch for as making it easier to catch the bus or T.)

All of these fit quite nicely with Beta.Boston.com, where the Globe’s digital team has been quietly releasing online products, and highlighting apps and sites created by others, including Citizen’s Connect, an app to report issues to the mayor’s office. You’ll also find their early OpenBlock demo with news and data from Boston neighborhoods.

The team at the Globe says to keep an eye on the beta space as they roll out toys and features for BostonGlobe.com, the new subscriber site that will parallel Boston.com.

November 24 2010

19:11

Hacks/Hackers London

First of all, the Iraq War Logs:

Round One – The Cleaning

Documents, records and words all hugely intimidating in their vastness. But some tools to help are MySQL, Ultraedit and Google Refine. But this stage is incredibly frustrating.

Round Two – The Problem

How do you tackle the types of documents? There was even small PDF files. Had to build a basic web interface for everyday queries. Needed multiple fields, this part is extremely difficult. Especially when you need to explain it to an editor. You have to have a healthy mistrust of the data. Asking the right questions is crucial. Asking something which the data is not structured to ask is the real problem.

Round Three – What We Did

Looked at key incidences and names of interest which the media had previously reported. Trick was to try and find what we didn’t know. First start by looking at categories of deaths by time. Found that it was murders rather than weapons fire that killed the most. It was the own civilian in-fighting. Use Tableau. Up to 100,000 records. Also had to get researches to sift through reports and manually verify what the data meant. Make sure if you do that that you organise a system that everyone uses to categories, calculate and tabulate. Can then use Exel and filter. Quicker with Access.

Data was used as part of research not just to make loads of charts. Visual maps tell a story. Quite powerful to an audience. Maps can be used for newsgathering. Asked journalists which areas they were interested in and sent them the reports geocoded. They could read up on the plane all the reports in the area they were heading to. Can also link a big story to it’s log. Prove it to be true. The log can validate a report, so you can use it.

What Did it Take?

10 week. 25 people. 30,000 reports. 5,000 reports manually recounted. More than one 18-hour day.

ScraperWiki

A lot of really useful information is not easily available on the web. Writing a web scraper not only makes searching and viewing information better but it can bring stories to light which were hidden in the mass of digital structures.

November 12 2010

15:00

Hacking data all night long: A NYC iteration of the hackathon model

In the main room of the Eyebeam Art and Technology Center’s massive 15,000-square foot office and lab space in the Chelsea neighborhood of Manhattan, more than sixty developers, designers, and journalists pore over their computer screens. A jumble of empty coffee cups and marked up scraps of butcher paper litter the tabletops while networks of power cords fan out underneath.

The event is The Great Urban Hack, a two-day overnight hackathon, organized by the meetup group Hacks/Hackers, that took place over an intense 30-hour stretch this past weekend. Starting early Saturday morning journalists and developers came together to “design, report on, code and create projects to help New Yorkers get the information they need while strengthening a sense of community.”

The eleven teams that participated in the event worked on a varied set of projects that ranged in scope from collaborative neighborhood mapping to live action urban gaming.

Rearranging and visualizing data

The team that worked on the project “Who’s My Landlord?,” based off of Elizabeth Dwoskin’s article of the same name in the Village Voice last Wednesday, concerned itself with the task of helping residents determine who owns a given piece of property. Dwoskin’s article points out that for many of the most derelict buildings in the city this link is obfuscated, a huge barrier for city agencies in their task of regulation to protect tenants. The team built a tool that draws from three databases: two from the city to pull the names of building owners, and one state database to look up the address of the owner when there is an intermediate company.

Several groups worked on visualizations of some form of city data. The “Drawing Conclusions” team created a “Roach Map” using the raw data set of restaurant inspection results from the NYC Data Mine. The group wrote a script that scans the data line-by-line and counts each violation by zip code. They then analyze the data, taking into account variation in the number of inspections across zip codes, and plot it on a map of the city which auto-generates every week.

How hackathons work is simple: They define goals and create artificial constraints (like time) to catalyze the process of innovation. The closest journalistic equivalent might be the collaborative rush of a good newsroom working a big breaking story. But is this really the best environment to incubate projects of a journalistic nature? What are the different circumstances that foster the healthiest practices of innovation? And what is the best way to set expectations for an event like this?

The hackathon model

Hackathons like this are a growing trend. A lot can be said for bringing these groups together and into a space outside of their normal work environment. What’s maybe most fascinating to me is the opportunity for cultural interplay as these two groups find themselves more and more immersed in each other’s creative work. As John Keefe, one of the hosts of the event and a senior producer at WNYC, says: “It’s not really journalistic culture to come together and build stuff like this.”

Chrys Wu, a co-organizer of Hacks/Hackers and both a journalist and developer, talked about the group’s different philosophy’s of sharing information: “Your traditional reporter has lots of lots of notes, especially if they’re a beat reporter. There’s also their rolodex or contacts database, which is extremely valuable and you wouldn’t want to necessarily share that. But there are pieces of things that you do that you can then reuse or mine on your own…at the same time technologists are putting up libraries of stuff, they say: ‘I’m not going to give you the secret sauce but I’m definitely going to give you the pieces of the sandwich.’”

Lots of questions remain: what is the best way to define the focus or scope for an event like this? Should they be organized around particular issues and crises? And what’s the best starting point for a journalistic project? Is it with a problem, a data set, a question, or as in the case of the landlord project: the research of a journalist? For all of the excitement around hackathons, this seems like just the beginning.

Photo by Jennifer 8. Lee used under a Creative Commons license.

November 09 2010

15:00

Loose ties vs. strong: Pinyadda’s platform finds that shared interests trump friendships in “social news”

There isn’t a silver bullet for monetizing digital news, but if there were, it would likely involve centralization: the creation of a single space where the frenzied aspects of our online lives — information sharing, social networking, exploration, recommendation — live together in one conveniently streamlined platform. A Boston-based startup called Pinyadda wants to be that space: to make news a pivotal element of social interaction, and vice versa. Think Facebook. Meets Twitter. Meets Foursquare. Meets Tumblr. Meets Digg.

Owned by Streetwise Media — the owner as well of BostInnovation, the Boston-based startup hub — Pinyadda launched last year with plans to be a central, social spot for gathering, customizing, and sharing news and information. The idea, at first, was to be an “ideal system of news” that would serve users in three ways:

1. it should gather information from the sites and blogs they read regularly;

2. it should mimic the experience of receiving links and comments from the people in their personal networks; and

3. it should be continually searching for information about subjects they were interested in. This pool of content could then be ranked and presented to users in a consistent, easily browsed stream.

Again, centralization. And a particular kind of centralization: a socialized version. Information doesn’t simply want to be free, the thinking went; it also wants to be social. The initial idea for Pinyadda was that leveraging the social side of the news — making it easy to share with friends; facilitating conversations with them — would also be a way to leverage the value of news. Which ties into the conventional wisdom about the distributive power of social news. In her recent NYRB review of The Social Network, Zadie Smith articulates that wisdom when it comes to Facebook’s Open Graph — a feature, she wrote, that “allows you to see everything your friends are reading, watching, eating, so that you might read and watch and eat as they do.”

What Pinyadda’s designers have discovered, though, is that “social” news doesn’t necessarily mean “shared with friends.” Instead, Pinyadda has found that extra-familiar relationships fuel news consumption and sharing in its network: Social news isn’t about the people you know so much as the people with whom you share interests.

Pinyadda’s business model was based on the idea that the social approach to news — and the personalization it relied on — would allow the platform to create a new value-capture mechanism for news. The platform itself, its product design and development lead, Austin Gardner-Smith, told me — with its built-in social networks and its capacity for recommendation and conversation — bolsters news content’s value with the experiential good that is community — since a “central point of consumption” tends to give the content being consumed worth by proximity.

The idea, in other words, was to take a holistic approach to monetization. Pinyadda aimed to take advantage of the platform’s built-in capacity for personalization — via behavioral tracking, or, less nefariously, paying attention to their individual users — to sell targeted ads against its content. “Post-intent” advertising is interest-based advertising — and thus, the thinking goes, more effective/less annoying advertising. That thinking still holds; in fact, the insight that common interests, rather than familiarity, fuels news consumption could ratifies it. As Dan Kennedy put it, writing about the startup after they presented at a Hacks/Hackers meetup this summer: “Pinyadda may be groping its way toward a just-right space between Digg (too dumb) and NewsTrust (too hard).” The question will be whether news consumers, so many of them already juggling relationships with Facebook and Twitter and Tumblr and Posterous and other such sites, can make room for another one. And the extent to which the relationships fostered in those networks — connections that are fundamentally personal — are the types that drive the social side of news.

October 26 2010

10:02

Hacks/Hackers London meetup to discuss Iraq War logs

Scraperwiki will supporting the November Hacks/Hackers London meetup at 7pm on Wednesday 24th November 2010 at The Irish Club, 2-4 Tudor Street, EC4Y 0AA, London. A few tickets are still available, but places are filling fast.

Schedule

  • 7.00pm: The data journalism behind the Iraq War Logs James Ball, Bureau of Investigative Journalism

James, Development Producer for the Bureau of Investigative Journalism and Chief Data Analyst on the TBIJ/Channel 4 Dispatches investigation into the Iraq War Logs, will explain how data journalism powered the process.

  • 7.30pm: TBC
  • 8pm: Social!

September 29 2010

14:00

Meta! Here’s how Storify looks telling the story of Storify

At the TechCrunch Disrupt conference this week, one of the new tools to emerge — besides, that is, Lark, the new app that wakes you “silently, without a jarring alarm” — was Storify. Founded by Burt Herman (a former AP reporter and founder of the journotech meetup group Hacks/Hackers) and developer/entrepreneur Xavier Damman, the platform promises a new way to leverage the real-time power of social media for creating stories. It’s doubling down on the increasingly common assumption that the future of news will demand curation on the part of news producers.

How does it work? With the caveat that the platform’s still in closed beta, it seems only appropriate to write the rest of this story using Storify.

Conclusion? The platform, at least in its current beta stage, might not be ideal for longer, text-heavy stories: The text field is a bit clunky, and the modular system lends itself more to narrative interruption than to flow. Still, the multimedia presentation aspect, used smartly, could be a refreshing counterpart to more traditional, text-heavy stories. (See for example, Penn professor and Wired blogger Tim Carmody’s engagingly Storified tale of a follower (re)quest.) And, for breaking news, where journalists might just be interested in the quick curation of tweets and videos, Storify’s drag-and-drop simplicity could be amazingly useful. It’s a simple mechanism for curating and contextualizing the atomized tumult that is the web — a little lifesaver for selected bits of information that otherwise might be lost to the news river’s rapids. Because, as Herman puts it, “stories are what last.”

September 13 2010

12:13

The first Birmingham Hacks/Hackers meetup – Monday Sept 20

Those helpful people at Hacks/Hackers have let me set up a Hacks/Hackers group for Birmingham. This is basically a group of people interested in the journalistic (and, by extension, the civic) possibilities of data. If you’re at all interested in this and think you might want to meet up in the Midlands sometime, please join up.

I’ve also organised the first Hacks/Hackers meetup for Birmingham on Monday September 20, in Coffee Lounge from 1pm into the evening.

Our speaker will be Caroline Beavon, an experienced journalist who caught the data bug on my MA in Online Journalism (and whose experiences I felt would be accessible to most). In addition, NHS Local’s Carl Plant will be talking briefly about health data and Walsall Council’s Dan Slee about council data.

All are welcome and no technical or journalistic knowledge is required. I’m hoping we can pair techies with non-techies for some ad hoc learning.

If you want to come RSVP at the link.

PS: There’s also a Hacks/Hackers in London, and one being planned for Manchester, I’m told.

June 21 2010

17:30

Hacks/Hackers, Mozilla team up for Peer-to-Peer course

One of the standing features of Knight’s Future of News and Civic Media conference is an award for collaborations that arise during the conference. And one winner this year was Hacks/Hackers — the journalists-and-programmers Meetup-group-turned-veritable-movement — and Mozilla, the open-source-oriented nonprofit. Together, the two groups will create a course through Peer-to-Peer University, with the aim of collective eduction: the hackers teaching the journos, and vice versa.

“We thought this was a perfect fit with Hacks and Hackers,” says Burt Herman, the group’s founder. “We have journalists teaching technology people about what that is, and the technology people teaching journalists.” The class fits in perfectly with that mission, he told me. “I think everybody is coming more and more to the realization that you need both sides of this to make something that works. It’s about great reporting, great writing, photos, video, content — coupled with amazing technology and innovation to help reporting and to present this to audiences.”

The class will be a six-week commitment, with one hour a week of lectures and one project. It will cover a broad range of topics, and instructors will (tentatively) include NYT interactive guru and Hacks/Hackers honcho Aron Pilhofer, Amanda Hickman (teaching about mapping), and David Cohn (instructing students on online collaboration). “And we’ve talked to a bunch of other Knight News Challenge winners about doing classes each week on data journalism, on online collaboration, on new business models for news,” Herman says. So “we’re looking forward to getting some interesting people…doing some training things. Which people have definitely asked for — on both sides.”

So what’s the ultimate goal — of the course, and of Hacks/Hackers more broadly? “The vision for Hacks and Hackers is to go beyond just Meetups, and to have people collaborating and doing things,” Herman says. (See, for example, last month’s KQED Hackathon, through which developer/journo teams built 12 new iPad apps in a period of 48 hours.) “Maybe people start companies out of these collaborations, maybe this is where new news organizations are born, or ideas that can help feed innovation,” Herman says. “Because it’s sort of outside any one news organization, that means we have the freedom to do what we want to — and that’s really what you need to innovate.”

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl