Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

August 20 2012

13:34

How Wikipedia Manages Sources for Breaking News

Almost a year ago, I was hired by Ushahidi to work as an ethnographic researcher on a project to understand how Wikipedians managed sources during breaking news events.

Ushahidi cares a great deal about this kind of work because of a new project called SwiftRiver that seeks to collect and enable the collaborative curation of streams of data from the real-time web about a particular issue or event. If another Haiti earthquake happened, for example, would there be a way for us to filter out the irrelevant, the misinformation, and build a stream of relevant, meaningful and accurate content about what was happening for those who needed it? And on Wikipedia's side, could the same tools be used to help editors curate a stream of relevant sources as a team rather than individuals?

pakistan.png

Ranking sources

When we first started thinking about the problem of filtering the web, we naturally thought of a ranking system that would rank sources according to their reliability or veracity. The algorithm would consider a variety of variables involved in determining accuracy, as well as whether sources have been chosen, voted up or down by users in the past, and eventually be able to suggest sources according to the subject at hand. My job would be to determine what those variables are -- i.e., what were editors looking at when deciding whether or not to use a source?

I started the research by talking to as many people as possible. Originally I was expecting that I would be able to conduct 10 to 20 interviews as the focus of the research, finding out how those editors went about managing sources individually and collaboratively. The initial interviews enabled me to hone my interview guide. One of my key informants urged me to ask questions about sources not cited as well as those cited, leading me to one of the key findings of the report (that the citation is often not the actual source of information and is often provided in order to appease editors who may complain about sources located outside the accepted Western media sphere). But I soon realized that the editors with whom I spoke came from such a wide variety of experience, work areas and subjects that I needed to restrict my focus to a particular article in order to get a comprehensive picture of how editors were working. I chose a 2011 Egyptian revolution article on Wikipedia because I wanted a globally relevant breaking news event that would have editors from different parts of the world working together on an issue with local expertise located in a language other than English.

Using Kathy Charmaz's grounded theory method, I chose to focus editing activity (in the form of talk pages, edits, statistics and interviews with editors) from January 25, 2011 when the article was first created (within hours of the first protests in Tahrir Square), to February 12 when Mubarak resigned and the article changed its name from "2011 Egyptian protests" to "2011 Egyptian revolution." After reviewing the big-picture analyses of the article using Wikipedia statistics of top editors, and locations of anonymous editors, etc., I started work with an initial coding of the actions taking place in the text, asking the question, "What is happening here?"

I then developed a more limited codebook using the most frequent/significant codes and proceeded to compare different events with the same code (looking up relevant edits of the article in order to get the full story), and to look for tacit assumptions that the actions left out. I did all of this coding in Evernote because it seemed the easiest (and cheapest) way of importing large amounts of textual and multimedia data from the web, but it wasn't ideal because talk pages, when imported, need to be re-formatted, and I ended up using a single column to code data in the first column since putting each conversation on the talk page in a cell would be too time-consuming.

evernote.png

I then moved to writing a series of thematic notes on what I was seeing, trying to understand, through writing, what the common actions might mean. I finally moved to the report writing, bringing together what I believed were the most salient themes into a description and analysis of what was happening according to the two key questions that the study was trying to ask: How do Wikipedia editors, working together, often geographically distributed and far from where an event is taking place, piece together what is happening on the ground and then present it in a reliable way? And how could this process be improved?

Key variables

Ethnography Matters has a great post by Tricia Wang that talks about how ethnographers contribute (often invisible) value to organizations by showing what shouldn't be built, rather than necessarily improving a product that already has a host of assumptions built into it.

And so it was with this research project that I realized early on that a ranking system conceptualized this way would be inappropriate -- for the single reason that along with characteristics for determining whether a source is accurate or not (such as whether the author has a history of presenting accurate news article), a number of important variables are independent of the source itself. On Wikipedia, these include variables such as the number of secondary sources in the article (Wikipedia policy calls for editors to use a majority of secondary sources), whether the article is based on a breaking news story (in which case the majority of sources might have to be primary, eyewitness sources), or whether the source is notable in the context of the article. (Misinformation can also be relevant if it is widely reported and significant to the course of events as Judith Miller's New York Times stories were for the Iraq War.)

nyt.png

This means that you could have an algorithm for determining how accurate the source has been in the past, but whether you make use of the source or not depends on factors relevant to the context of the article that have little to do with the reliability of the source itself.

Another key finding recommending against source ranking is that Wikipedia's authority originates from its requirement that each potentially disputed phrase is backed up by reliable sources that can be checked by readers, whereas source ranking necessarily requires that the calculation be invisible in order to prevent gaming. It is already a source of potential weakness that Wikipedia citations are not the original source of information (since editors often choose citations that will be deemed more acceptable to other editors) so further hiding how sources are chosen would disrupt this important value.

On the other hand, having editors provide a rationale behind the choice of particular sources, as well as showing the variety of sources rather than those chosen because of loading time constraints may be useful -- especially since these discussions do often take place on talk pages but are practically invisible because they are difficult to find.

Wikipedians' editorial methods

Analyzing the talk pages of the 2011 Egyptian revolution article case study enabled me to understand how Wikipedia editors set about the task of discovering, choosing, verifying, summarizing, adding information and editing the article. It became clear through the rather painstaking study of hundreds of talk pages that editors were:

  1. storing discovered articles either using their own editor domains by putting relevant articles into categories or by alerting other editors to breaking news on the talk page,
  2. choosing sources by finding at least two independent sources that corroborated what was being reported but then removing some of the citations as the page became too heavy to load,
  3. verifying sources by finding sources to corroborate what was being reported, by checking what the summarized sources contained, and/or by waiting to see whether other sources corroborated what was being reported,
  4. summarizing by taking screenshots of videos and inserting captions (for multimedia) or by choosing the most important events of each day for a growing timeline (for text),
  5. adding text to the article by choosing how to reflect the source within the article's categories and providing citation information, and
  6. editing disputing the way that editors reflected information from various sources and replacing primary sources with secondary sources over time.

It was important to discover the work process that editors were following because any tool that assisted with source management would have to accord as closely as possible with the way that editors like to do things on Wikipedia. Since the process is managed by volunteers and because volunteers decide which tools to use, this becomes really critical to the acceptance of new tools.

sources.png

Recommendations

After developing a typology of sources and isolating different types of Wikipedia source work, I made two sets of recommendations as follows:

  1. The first would be for designers to experiment with exposing variables that are important for determining the relevance and reliability of individual sources as well as the reliability of the article as a whole.
  2. The second would be to provide a trail of documentation by replicating the work process that editors follow (somewhat haphazardly at the moment) so that each source is provided with an independent space for exposition and verification, and so that editors can collect breaking news sources collectively.

variables.png

Regarding a ranking system for sources, I'd argue that a descriptive repository of major media sources from different countries would be incredibly beneficial, but that a system for determining which sources are ranked highest according to usage would yield really limited results. (We know, for example, that the BBC is the most used source on Wikipedia by a high margin, but that doesn't necessarily help editors in choosing a source for a breaking news story.) Exposing the variables used to determine relevancy (rather than adding them up in invisible amounts to come up with a magical number) and showing the progression of sources over time offers some opportunities for innovation. But this requires developers to think out of the box in terms of what sources (beyond static texts) look like, where such sources and expertise are located, and how trust is garnered in the age of Twitter. The full report provides details of the recommendations and the findings and will be available soon.

Just the beginning

This is my first comprehensive ethnographic project, and one of the things I've noticed when doing other design and research projects using different methodologies is that, although the process can seem painstaking and it can prove difficult to articulate the hundreds of small observations into findings that are actionable and meaningful to designers, getting close to the experience of editors is extremely valuable work that is rare in Wikipedia research. I realize now that in the past when I actually studied an article in detail, I knew very little about how Wikipedia works in practice. And this is only the beginning!

Heather Ford is a budding ethnographer who studies how online communities get together to learn, play and deliberate. She currently works for Ushahidi and is studying how online communities like Wikipedia work together to verify information collected from the web and how new technology might be designed to help them do this better. Heather recently graduated from the UC Berkeley iSchool where she studied the social life of information in schools, educational privacy and Africans on Wikipedia. She is a former Wikimedia Foundation Advisory Board member and the former Executive Director of iCommons - an international organization started by Creative Commons to connect the open education, access to knowledge, free software, open access publishing and free culture communities around the world. She was a co-founder of Creative Commons South Africa and of the South African nonprofit, The African Commons Project as well as a community-building initiative called the GeekRetreat - bringing together South Africa's top web entrepreneurs to talk about how to make the local Internet better. At night she dreams about writing books and finding time to draw.

This article also appeared at Ushahidi.com and Ethnography Matters. Get the full report at Scribd.com.

August 14 2012

14:00

What's Next for Ushahidi and Its Platform?

This is part 2 in a series. In part 1, I talked about how we think of ourselves at Ushahidi and how we think of success in our world. It set up the context for this post, which is about where we're going next as an organization and with our platform.

We realize that it's hard to understand just how much is going on within the Ushahidi team unless you're in it. I'll try to give a summarized overview, and will answer any questions through the comments if you need more info on any of them.

The External Projects Team

Ushahidi's primary source of income is private foundation grant funding (Omidyar Network, Hivos, MacArthur, Google, Cisco, Knight, Rockefeller, Ford), and we don't take any public funding from any country so that we are more easily able to maintain our neutrality. Last year, we embarked on a strategy to diversify our revenue stream, endeavoring to decrease our percentage of revenues based on grant funding and offset that with earned revenue from client projects. This turned out to be very hard to do within our current team structure, as the development team ended up being pulled off of platform-side work and client-side work suffered for it. Many internal deadlines were missed, and we found ourselves unable to respond to the community as quickly as we wanted.

This year we split out an "external projects team" made up of some of the top Ushahidi deployers in the world, and their first priority is to deal with client and consulting work, followed by dev community needs. We're six months into this strategy, and it seems like this team format will continue to work and grow. Last year, 20% of our revenue was earned; this year we'd like to get that to the 30-40% range.

Re-envisioning Crowdmap

When anyone joins the Ushahidi team, we tend to send them off to some conference to speak about Ushahidi in the first few weeks. There's nothing like knowing that you're going to be onstage talking about your new company to galvanize you into really learning about and understanding everything about the organization. Basically, we want you to understand Ushahidi and be on the same mission with us. If you are, you might explain what we do in a different way than I do onstage or in front of a camera, but you'll get the right message out regardless.

crowdmap-screenshot-mobile-397x500.png

You have a lot of autonomy within your area of work, or so we always claimed internally. This was tested earlier this year, where David Kobia, Juliana Rotich and myself as founders were forced to ask whether we were serious about that claim, or were just paying it lip-service. Brian Herbert leads the Crowdmap team, which in our world means he's in charge of the overall architecture, strategy and implementation of the product.

The Crowdmap team met up in person earlier this year and hatched a new product plan. They re-envisioned what Crowdmap could be, started mocking up the site, and began building what would be a new Crowdmap, a complete branch off the core platform. I heard this was underway, but didn't get a brief on it until about six weeks in. When I heard what they had planned, and got a complete walk-through by Brian, I was floored. What I was looking at was so different from the original Ushahidi, and thus what we have currently as Crowdmap, that I couldn't align the two in my mind.

My initial reaction was to shut it down. Fortunately, I was in the middle of a random 7-hour drive between L.A. and San Francisco, so that gave me ample time to think by myself before I made any snap judgments. More importantly, it also gave me time to call up David and talk through it with him. Later that week, Juliana, David and I had a chat. It was at that point that we realized that, as founders, we might have blinders on of our own. Could we be stuck in our own 2008 paradigm? Should we trust our team to set the vision for a product? Did the product answer the questions that guide us?

The answer was yes.

The team has done an incredible job of thinking deeply about Crowdmap users, then translating that usage into a complete redesign, which is both beautiful and functional at the same time. It's user-centric, as opposed to map-centric, which is the greatest change. But, after getting around our initial feelings of alienness, we are confident that this is what we need to do. We need to experiment and disrupt ourselves -- after all, if we aren't willing to take risks and try new things, then we fall into the same trap that those who we disrupted did.

A New Ushahidi

For about a year we've been asking ourselves, "If we rebuilt Ushahidi, with all we know now, what would it look like?"

To redesign, re-architect and rebuild any platform is a huge undertaking. Usually this means part of the team is left to maintain and support the older code, while the others are building the shiny new thing. It means that while you're spending months and months building the new thing, that you appear stagnant and less responsive to the market. It means that you might get it wrong and what you build is irrelevant by the time it's launched.

Finally, after many months of internal debate, we decided to go down this path. We've started with a battery of interviews with users, volunteer developers, deployers and internal team members. The recent blog post by Heather Leson on the design direction we're heading in this last week shows where we're going. Ushahidi v3 is the complete redesign of Ushahidi's core platform, from the first line of code to the last HTML tag. On the front-end it's mobile web-focused out of the gate, and the backend admin area is about streamlining the publishing and verification process.

At Ushahidi we are still building, theming and using Ushahidi v2.x, and will continue to do so for a long time. This idea of a v3 is just vaporware until we actually decide to build it, but the exercise has already born fruit because it forces us to ask what it might look like if we weren't constrained by the legacy structure we had built. We'd love to get more input from everyone on this as we go forward.

SwiftRiver in Beta

After a couple of fits and starts, SwiftRiver is now being tried out by 500-plus beta testers. It's 75% of the way to completion, but usable, and so it's out and we're getting the feedback from everyone on what needs to be changed, added and removed in order to make it the tool we all need to manage large amounts of data. It's an expensive, server-intensive platform to run, so those who use it in the future will have to pay for its use when using it on our servers. As always, the core code will be made available, free and open source, for those who would like to set it up and run it on their own.

In Summary

The amount of change and internal change that Ushahidi is undertaking is truly breathtaking to us. We're cognizant of just how much we're putting on the edge. However, we know this; in our world of technology, those who don't disrupt themselves will themselves be disrupted. In short, we'd rather go all-in to make this change happen ourselves than be mired in a state of stagnancy and defensive activity.

As always, this doesn't happen in a vacuum for Ushahidi. We've relied on those of you who are the coders and deployers to help us guide the platforms for over four years. Many of you have been a part of one of these product rethinks. If you aren't already, and would like to be, get in touch with myself or Heather to get into it and help us re-envision and build the future.

Raised in Kenya and Sudan, Erik Hersman is a technologist and blogger who lives in Nairobi. He is a co-founder of Ushahidi, a free and open-source platform for crowdsourcing information and visualizing data. He is the founder of AfriGadget, a multi-author site that showcases stories of African inventions and ingenuity, and an African technology blogger at WhiteAfrican.com. He currently manages Ushahidi's operations and strategy, and is in charge of the iHub, Nairobi's Innovation Hub for the technology community, bringing together entrepreneurs, hackers, designers and the investment community. Erik is a TED Senior Fellow, a PopTech Fellow and speaker and an organizer for Maker Faire Africa. You can find him on Twitter at @WhiteAfrican

This post originally appeared on Ushahidi's blog.

June 15 2010

14:45

Teaching Ushahidi 101 in Kenya

This post is written by Melissa Tully and Rebecca Wanjiku. Melissa Tully is a PhD student at UW-Madison who is researching the use of social/new media in social justice work in Kenya. She has been volunteering with Ushahidi for the past two-and-a-half years.
Rebecca Wanjiku is a project assistant for Ushahidi in Kenya. She interfaces with many organizations and individuals who have inquiries about Ushahidi.

This month and last month saw the first ever "Ushahidi 101" events held at the iHub in Nairobi. The first Ushahidi 101 gathering took place on May 12 and attracted 16 people from different organizations in Nairobi. The session was designed to introduce Ushahidi to people who may not know how it works, discuss the sites features, highlight uses of Ushahidi by Kenyan organizations and give potential and current users a chance to network. The second 101 occurred on June 7, drew a crowd of 21 people, and featured a live stream of the event for those who could not attend.

Usahidi is an open source platform that helps people map breaking news events and disasters in real time. It has its roots in a mapping project for reports of violence in Kenya after the post-election fallout at the beginning of 2008.

In both 101 sessions, current users shared their experiences with the group. They offered insight and lessons about deploying Ushahidi, including tips on marketing and strategy, as well as more technical advice such as setting up an SMS system.

Presentation and Breakout Sessions

The 101 sessions were broken up into two main parts: A presentation (which we've uploaded to SlideShare) and breakout sessions. The presentation gives a brief history of Ushahidi, basic information about the platform, and showcases different deployments, focusing on what's been done in Kenya.

Ush 101

In the June 101, we were lucky enough to have Erica Hagen from Map Kibera and Voice of Kibera in attendance to give a presentation on their work and how they have customized the Ushahidi platform to be a space for community reporting. In the May 101, Marten Schoonman from Media Focus on Africa Foundation shared his experiences working on two Ushahidi deployments: Unsung Peace Heroes and the ongoing Building Bridges project, which are both focused on peace and peace-building in Kenya. We also heard from Su Stephanou about her initiative to map organic farmers in Kenya, as well as members of the Kenya AIDS NGOs Consortium about their experience mapping TB, HIV and AIDS services.

It was great to have users of the platform at the event. During the breakout sessions they were able to discuss their experiences and field questions from attendees. These sessions were invaluable to get the networking rolling, and many conversations continued well past the end of the event. The three breakout sessions focused on: Basic Ushahidi back-end functionality; the more advanced technical components of Ushahidi; and publicizing and marketing the platform.

Ush 101

Overall, both Ushahidi 101s were a great success and it was amazing to have folks from so many different types of organizations present, including Federation of Red Cross Red Crescent Societies (IFRC), ILO-Somalia, Association of Media Women in Kenya, and Oxfam, among others.

The 101s will now be monthly events at the iHub and the next Ushahidi 101 is scheduled for July 13. Also, on June 16, we'll have the first SwiftRiver 101 organized by Jon Gosier and the first Developers 101 on June 28. The SwiftRiver event will feature a basic overview of the platform, as well as a highly technical session for developers. The Developers 101 is for all the techies looking to get their hands dirty with some Ushahidi code. All of these events help to build the community around Ushahidi, a critical component for an open-source initiative like ours.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl