Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

July 01 2011

15:00

ProPublica’s newest news app uses education data to get more social

Yesterday, the U.S. Department of Education’s Office of Civil Rights released a data set — the most comprehensive to date — documenting student access to advanced classes and special programs in U.S. public high schools. Shorthanded as the Civil Rights survey, the information tracks the access schools provide to their students for offerings, like Advanced Placement courses, gifted-and-talented programs, and higher-level math and science classes, that studies suggest are important factors for educational attainment — and for success later in life.

ProPublica reporters used the Ed data to produce a story package, “The Opportunity Gap,” that analyzes the OCR info and other federal education data; their analysis found that, overall and unsurprisingly, high-poverty schools are less likely than their wealthier counterparts to have students enrolled in those beneficial programs. The achievement gap, the data suggest, isn’t just about students’ educational attainment; it’s also about the educational opportunities provided to them in the first place. And it’s individual states that are making the policy decisions that affect the quality of those opportunities. ProPublica’s analysis, says senior editor Eric Umansky, is aimed at answering one key question: “Are states giving their kids a fair shake?”

The fact that the OCR data set is relatively comprehensive — reporting on districts with more than 3,000 students, it covers 85,000 schools, and around 75 percent of all public high schoolers in the U.S. — means that the OCR data set is also enormous. And while ProPublica’s text-based takes on the info have done precisely the thing you’d want them to do — find surprises, find trends, make it meaningful, make it human — the outfit’s reporters wanted to go beyond the database-to-narrative formula with the OCR trove. Their solution: a news app that encourages, even more than your typical app, public participation. And that looks to Facebook for social integration.

The app focuses on measuring equal access on a broad scale: It tracks not only the educational opportunities provided by each school, but also the percentage of teachers with two years’ experience or less — who, as a group, tend to effect smaller attainment gains than their more experienced counterparts — and the percentage of students who receive free or reduced-price school lunch, an indicator of poverty. (More on the developers’ methodology here.)

ProPublica leads the field in developing news apps; each one requires careful thought about how users will actually navigate — and benefit from — the app. With this one, though, “we were focusing a lot more on what behaviors we wanted to encourage,” says Scott Klein, ProPublica’s editor of news applications. ProPublica thinks about how organize reporters, both within and outside of its newsroom, around its stories, notes Amanda Michel, ProPublica’s director of distributed reporting. “Here, we wanted to take it one step further.”

With that in mind, the app invites both macro and micro analysis, with an implicit focus on personal relevance: You can parse the data by state, or you can drill down to individual schools and districts — the high school you went to, or the one that’s in your neighborhood. And then, even more intriguingly, you can compare schools according to geographical proximity and/or the relative wealth and poverty of their student bodies. (Cambridge Rindge and Latin School, just down the street from the Lab, has 1,585 students, 38 percent of whom receive a free or reduced-price lunch; Medfield Senior High, a few miles southwest of Cambridge, has 920 students and a 1-percent free/reduced lunch rate. Four percent of Rindge and Latin’s students are enrolled in advanced math courses; for Medfield High, the rate is 42 percent.) “It really is an auto-story generator,” Umansky says.

And sharing — the Facebook aspect of the app — is a big part of the behavior ProPublica’s news apps team wanted to encourage. They considered social integration from a structural perspective, notes Al Shaw, the developer who authored the app, and worked with Vadim Lavrusik, Facebook’s Journalist Program Manager, to optimize the app-to-Facebook interface. One small-but-key feature: With that integration, users who are signed into Facebook can generate an individual URL for each cluster of data they dig up — the Rindge and Latin-versus-Medfield comparison, say — to make sharing and referencing the data almost seamless. The resulting page has a “share on Facebook” button along with a note: “Use this hashtag to share your insights on Twitter: #myschoolyourschool.”

The ed-data app isn’t social for its own sake; instead, it serves the broad and sometimes nebulous goal of having impact — on both a personal and a policy level. “We invest so much time into acquiring data and cleaning data and making sense of data,” Michel notes; ultimately, though, data doesn’t mean much unless people can understand how it immediately affects them, their communities, their kids. Its newest app, Michel says, is part of ProPublica’s broader strategy: to make data, overall, more social. (They’d like to do a similar integration with Twitter, too, she says.) The point is to find ways to marry social and story, to turn online interactions into their own kind of data sets so, she says, “people can layer their stories on top of them.”

March 01 2011

17:30

December 21 2010

17:00

At #Niemanleaks, a new generation of tools to manage floods of new data

Whether it’s 250,000 State Department cables or the massive spending databases on Recovery.gov, the trend in data has definitely become “more.” That presents journalists with a new problem: How do you understand and explain data when it comes by the gigabyte? At the Nieman Foundation’s one-day conference on secrecy and journalism, presenters from the New York Times, Sunlight Foundation, and more offered solutions — or at least new ways of thinking about the problems.

Think like a scientist

With the massive amounts of primary documents now available, journalists have new opportunities to bring their readers into their investigations — which can lead to better journalism. John Bohannon, a contributing correspondent for Science Magazine, said his background as a scientist was great preparation for investigative reporting. “The best kind of investigative journalism is like the best kind of science,” he said. “You as the investigator don’t ask your readers to take your claims at face value: You give them the evidence you’ve gathered along the way and ask them to look it with you.”

It’s not a radical idea, but it’s one being embraced in new ways. For Bohannon, it meant embedding with a unit in Afganistan and methodically gathering first-hand data about civilian deaths — a more direct and reliable indicator than the less expensive and safer method of counting media-reported deaths. He also found his scientific approach was met with more open answers from a military known for tight information control. “Sometimes if you politely ask for information, large powerful organizations will actually give it to you,” he said.

The future will be distributed: BitTorrent, not Napster

Two of the projects discussed, Basetrack and DocumentCloud, invite broader participation in the news process, the former in the sourcing and the latter with the distribution.

Basetrack, a Knight News Challenge winner, goes beyond the normal embedding process to more actively involving the Marines of First Battalion, Eighth Marine Regiment as they deploy overseas in reporting their experiences. Teru Kuwayama, who leads the project and deployed with the battalion to Afghanistan, said ensuring that confidential information wasn’t released, putting lives in danger, was essential to building trust and openness with the project. So Basetrack built a “Denial of Information” tool that allowed easy, pre-publication redactions, with the caveat that the fact of those redactions — and the reasons given for them — would be made public. It’s a compromise that promises a greater intimacy and a collaborative look at life at war while ensuring the safety of the soldiers.

Fellow News Challenge winner DocumentCloud, on the other hand, distributes the primary documents dug up through traditional investigative journalism, such as historical confidential informant files or flawed electoral ballot designs. Aron Pilhofer, editor of interactive news at The New York Times, said he was unsure about whether journalists would actually use it when his team began working on the project — but since then dozens of organizations have embraced it, happy to take readers along for the ride of the investigative process.

These new ways of distributing reporting were just the beginning, Pilhofer said, with a trend that will likely push today’s marquee whistleblower out of the limelight. “WikiLeaks was very much a funnel going in and very much a funnel going out,” he said. “Distributed is the future.” A new project, called OpenLeaks, will embrace a less centralized model, building technology to allow anonymous leaks without a central organization to be taken out.

Big data’s day is here

The panel also tackled how to digest truly massive data sets. Bill Allison, editorial director of the Sunlight Foundation, detailed how his organization collected information on everything from earmarks to political fundraising parties. Allison said making this data actually meaningful required context, which could be simple as mapping already available data or scoring government databases based on understandable criteria.

“We try to make the information easy to use,” he said. But beyond the audience of curious constituents who use Sunlight’s tools, a much broader audience is reached as hundreds of journalists around the country use Sunlight’s tools to dig up local stories they might not otherwise have noticed — creating a rippling effect of transparency

December 01 2010

15:30

Keeping track of political candidates online: Web archiver Perpetually follows the digital campaign trail

There is one huge, almost infinitely wide memory gap in our culture that can be summed up with this question: Where does the Internet go when it dies? Not the whole Internet, but the individual websites and pages that every day are modified and deleted, discarded and cached. Who can a journalist turn to when needing to look up the older version of a website, a retired blog, or a deleted Facebook post?

It turns out, not many people. The hole that Nexis plugs for academic papers and the newspapers of the world has few equivalents online. The once excellent Wayback Machine: Internet Archive — an attempt at a complete, Library of Congress-worthy web archive — is now fairly useless in today’s social-media driven web world, storing a slipshod record of photos, multimedia, and basically anything that’s not Web 1.0, and on top of that, taking up to a year for updates to appear in its index after its spider has crawled a site.

This election season, as candidates propped up their digital campaign booths online with Twitter feeds and new, snazzy websites, Darrell Silver, founder of the Perpetually Public Data Project, realized this was actually kind of terrifying. For all the thousands of reporters following candidates’ buses and rallies, there was no mechanism to follow the campaign trail online. Anything pledged on a candidate’s website could be wiped out with the click of a mouse — and without so much as a peep.

To fill this collective memory hole, for the 2010 midterms, Perpetually archived the websites — and the Facebook, MySpace, and Twitter accounts — of every politician it could find: all the major candidates for all 435 House and 37 Senate races. And it archived every change at every second of every minute: Flash, blog posts, photos, whatever — with the exception of YouTube videos, which have a copyright conflict, and he decided to discard.

The result has been a great experiment that’s made at least one news splash and brought the technology onto the Huffington Post. After the election, Silver added every congressperson — newly elected or not — and every governor to Perpetually’s database. Imagine the difference at some point in the future: Anyone will to be able to zoom into any point in the past, load up a politician’s website, and see how things stood on any given day in any given year. And then, a few clicks more, to be able to scroll through the politician’s history and to create a larger story about the politician over a wide, career-length timespan. “We’re trying to be the undebatable reference point for the source material and the proof of what happened when,” Silver said.

The site, thus far, has given that goal its best shot, although it has got a way to go. In terms of the breadth of his archive and the depth of its storage, Silver’s peerless. ProPublica’s Versionista is perhaps his closest competitor, but for now it doesn’t track candidate sites, only the Whitehouse.gov site. Moreover, the Versionista platform shows only specific html-coded changes, so it monitors mostly text and lacks a screenshot archive, a complete record of images, and interactive elements. Silver had many of the same critiques — lack of interactive elements, a generally superficial archiving — for the Wayback Machine (not to mention its dinosaur lag-time in updating its archive).

If his database is the gold standard for Internet archiving, on Perpetually’s front-end — the site visitors use to navigate the database — the story was less nice. In the rush to get things up, a shaky vision for the project created the odd mess of creaky widgets, bridge-to-nowhere links, and brilliant data-archiving that were the site for the few weeks it was live.

The site as it existed is a good case study in how a great concept with poor execution can crash and burn — and then potentially redeem itself. In Silver’s defense, he had little time to get things together. Perpetually began archiving candidates’ sites in June — not knowing exactly what he would do with the data — and managed with only a team of five to have a website up for the general public by early October.

But it was painful to use. You could see that some idea, some vision, was at work, but it was hard to see how whoever was behind the thing actually thought they could pull it off. Links broke, videos gave errors, and community was non-existent. The annotations page — an absurd Tumblr-style page with no entries limit — with a larger user base would have sent an average laptop crashing to its knees, and text-diff mode gave an html page read-out, a fairly frightening chunk of words and symbols specializing in alienation and confusion.

The good news, though, is that as far as Perpetually’s future is concerned, its history doesn’t matter: Perpetually has gone into hibernation for a complete overhaul and redesign. “One of the things I learned is that there’s a huge amount of interest of tracking politicians who are nationally or locally interesting,” said Silver. “But you have to provide a lot better and more immediate goals and feedback.”

Silver’s looked at the Guardian’s expense-scandal tracker for ideas on how to use better crowdsourcing mechanisms, like promoting what’s interesting and highlighting top users. And he likes Versionista’s feed-subscription service that gives users instant notification of changes made by a specific candidate. Silver — who is far more of a tech geek than politico — just did not understand a political junkie’s motivations, but he’s clearly getting there, and it is likely that his redesign will showcase a savvy pastiche of social media tools he culls from around the Internet.

If these changes make the site user-friendly, journalists should rejoice. As it stands, the tools available to journalists to retrieve information about a candidate’s online campaign trail are unreliable and incomplete, jeopardizing online accountability. We’ve already seen how easily that can happen. Perpetually provides a common resource to circumvent this problem. “That ability to see, to go beyond the Wikipedia summary is vital to…the history to what this person is saying,” Silver put it.

Non-journalists — whoever these people might be — have reason to celebrate, too. It’s easy to imagine a day when early website incarnations have Americana value, like The Museum of Moving Image has archived online has rediscovered in presidential TV ads. The White House itself seems to be getting in on the idea. It’s created “frozen in time” portraits of previous administrationswebsites, anointing them with the exclusive “.gov” extension along with the program.

These are big ideas — an institutional memory hole, the making of a blog into classic memorabilia — and the opportunity is there for Silver to make them a reality. But before any of that happens, he still has to get the details right. He says has set forth three things he believes his audience wants and that a remade Perpetually must do for them:

“People want to know about significant changes and want to research the candidates they don’t know about. [They] want to be kept up to date and want a way to do that really easily. The third thing they want is to participate. They all want to improve the election process and want to discuss and do it in an efficient way.”

News organizations, take note: Leading up to 2012, Perpetually’s a site to watch.

September 20 2010

14:00

L.A. Times’ controversial teacher database attracted traffic and got funding from a nontraditional source

Not so long ago, a hefty investigative series from the Los Angeles Times might have lived its life in print, starting on a Monday and culminating with abig package in the Sunday paper. But the web creates the potential for long-from and in-depth work to not just live on online, but but do so in a more useful way than a print-only story could. That’s certainly the case for the Times’ “Grading the Teachers,” a series based on the “value-added” performance of individual teachers and schools. On the Times’ site, users can review the value-added scores of 6,000 3rd- through 5th-grade teachers — by name — in the Los Angeles Unified School District as well as individual schools. The decision to run names of individual teachers and their performance was controversial.

The Times calculated the value-added scores from the 2002-2003 school year through 2008-2009 using standardized test data provided by the school district. The paper hired a researcher from RAND Corp. to run the analysis, though RAND was not involved. From there, in-house data expert and long-time reporter Doug Smith figured out how to present the information in a way that was usable for reporters and understandable to readers.

As might be expected, the interactive database has been a big traffic draw. Smith said that since the database went live, more than 150,000 unique visitors have checked it out. Some 50,000 went right away and now the Times is seeing about 4,000 users per day. And those users are engaged. So far the project has generated about 1.4 million page views — which means a typical user is clicking on more than 9 pages. That’s sticky content: Parents want to compare their child’s teacher to the others in that grade, their school against the neighbor’s. (I checked out my elementary school alma mater, which boasts a score of, well, average.)

To try to be fair to teachers, the Times gave their subjects a chance to review the data on their page and respond before publication. But that’s not easy when you’re dealing with thousands of subjects, in a school district where email addresses aren’t standardized. An early story in the series directed interested teachers to a web page where they were asked to prove their identity with a birth date and a district email address to get their data early. About 2,000 teachers did before the data went public. Another 300 submitted responses or comments on their pages.

“We moderate comments,” Smith said. “We didn’t have any problems. Most of them were immediately posteable. The level of discourse remained pretty high.”

All in all, it’s one of those great journalism moments at the intersection of important news and reader interest. But that doesn’t make it profitable. Even with the impressive pageviews, the story was costly from the start and required serious resource investment on the part of the Times.

To help cushion the blow, the newspaper accepted a grant from the Hechinger Report, the education nonprofit news organization based at Columbia’s Teachers College. [Disclosure: Lab director Joshua Benton sits on Hechinger's advisory board.] But aside from doing its own independent reporting, Hechinger also works with established news organizations to produce education stories for their own outlets. In the case of the Times, it was a $15,000 grant to help get the difficult data analysis work done.

I spoke with Richard Lee Colvin, editor of the Hechinger Report, about his decision to make the grant. Before Hechinger, Colvin covered education at the Times for seven years, and he was interested in helping the newspaper work with a professional statistician to score the 6,000 teachers using the “value-added” metric that was the basis for the series.

“[The L.A. Times] understood that was not something they had the capacity to do internally,” Colvin said. “They had already had conversations with this researcher, but they needed financial support to finish the project.” (Colvin wanted to be clear that he was not involved in the decision to run individual names of teachers on the Times’ site, just in analzying the testing data.) In exchange for the grant, the L.A. Times allowed Hechinger to use some of its content and gave them access to the data analysis, which Colvin says could have future uses.

At The Hechinger Report, Colvin is experimenting with how it can best carry out their mission of supporting in-depth education coverage — producing content for the Hechinger website, placing its articles with partner news organizations, or direct subsidies as in the L.A. Times series. They’re currently sponsoring a portion of the salary of a blogger at the nonprofit MinnPost whose beat includes education. “We’re very flexible in the ways we’re working with different organizations,” Colvin said. But, to clarify, he said, “we’re not a grant-making organization.”

As for the L.A. Times’ database, will the Times continue to update it every year? Smith says the district has not yet handed over the 2009-10 school year data, which isn’t a good sign for the Times. The district is battling with the union over whether to use value-added measurements in teacher evaluations, which could make it more difficult for the paper to get its hands on the data. “If we get it, we’ll release it,” Smith said.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl