Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 11 2013

17:48

Privacy versus transparency: Connecticut bans access to many homicide records post-Newtown

Editor’s note: Our friends at Harvard’s Digital Media Law Project wrote this interesting post on the new, Newtown-inspired limits on public access to information about homicides in Connecticut. We thought it was worth amplifying, so we’re republishing it here.

digital-media-law-project-dmlp-cmlpAt a time when citizens increasingly call for government transparency, the Connecticut legislature recently passed a bill to withhold graphic information depicting homicides from the public in response to records from last December’s devastation at Sandy Hook Elementary School.

Though secret discussions drafting this bill reportedly date back to at least early April, the bill did not become public knowledge until an email was leaked to the Hartford Courant on May 21. The initial draft of what became Senate Bill 1149 offered wide protection specifically for families of victims of the December 14 shootings, preventing disclosure of public photographs, videos, 911 audio recordings, death certificates, and more.

Since then, there has been a whirlwind of activity in Connecticut. After a Fox reporter brought to the attention of Newtown families a blog post by Michael Moore suggesting the gruesome photos should be released, parents of children lost in the terrible shooting banded together to write a petition to “keep Sandy Hook crime scene information private.” The petition, which received over 100,000 signatures in a matter of days, aimed to “urge the Connecticut legislature to pass a law that would keep sensitive information, including photos and audio, about this tragic day private and out of the hands of people who’d like to misuse it for political gain.”

As this petition was clearly concerned about exploitation by Moore and others, Moore later clarified his position, emphasizing that the photos should not be released without the parents’ permission. Rather, he spoke about the potential significance of these photos if used voluntarily to resolve the gun control debate, in the same manner that Emmet Till’s mother releasing a photo of her son killed by the KKK influenced the civil rights movement.

Like the petitioners, members of the Connecticut legislature responded with overwhelming support for SB 1149. Working into the early hours of June 4, the last day of the legislative session, the state Senate and House approved the bill 33-2 and 130-2, respectively. The bill as approved exempts photographs, film, video, digital or other images depicting a homicide victim from being part of the public record “to the extent that such record could reasonably be expected to constitute an unwarranted invasion of the personal privacy of the victim or the victim’s surviving family members.” The bill particularly protects child victims, exempting from disclosure the names of victims and witnesses under 18 years old. It would also limit disclosure of audio records of emergency and other law enforcement calls as public records, such that portions describing the homicide victim’s condition would not have to be released, though this provision will be reevaluated by a 17-member task force by May 2014.

Though more limited in scope than the original draft with respect of the types of materials that may not be disclosed, this final bill addresses all homicides committed in the state, not only the massacre in Newtown. It was signed by Governor Dannel Malloy within twelve hours of the legislature’s vote and took effect immediately.

From the beginning, this topic has raised concerns with respect to Connecticut’s Freedom of Information Act and government transparency. In addition to being drafted in secrecy, the bill was not subjected to the traditional public hearing process. All four representatives who voted against SB 1149 raised these democratic concerns, challenging the process and scope of this FOI exemption. This blogger agrees that in its rush to appropriately protect the grieving families of Newtown before the session ended, Connecticut’s legislature went too far in promoting privacy over public access to records, namely with respect to the broad extension of the bill to all homicides and limitations on releasing 911 calls.

Though influenced primarily by the plight of those in Newtown, SB 1149 makes no distinction based on the gravity or brutality of the homicide, or any other factor that may relate to the strength of the privacy interest. Instead, it restricts access to traditionally public records for all homicides in the state, reaching far beyond the massacre at Sandy Hook. As the Chief State’s Attorney Kevin Kane said with respect to photographs depicting injuries to victims and recordings of their distress, “it seems to me that the intrusion of the privacy of the individuals outweighs any public interest in seeing these.” Pressure to expand the bill as Kane desired came primarily from advocates of the legislature’s Black and Puerto Rican Caucus. They criticized the fairness of differentiating between the protection owed to Newtown families and that due the families of homicide victims in urban areas, where homicides occur more frequently.

This fairness and equality based argument raises valid concerns about how the legislature is drawing the line between protected and unprotected records: If limited to the shootings at Sandy Hook, then in the future, what level of severity would make visual records of a killing “worthy” of exemption from disclosure? But an all-inclusive exemption like the one Connecticut passed goes too far in restricting the public’s access to important public records. It restricts public access to information so long as a minimal privacy interest is established, regardless of the strength of the interest in disclosure. While restricting the release of photos of the young children who lost their lives this past December is based in a strong privacy interest that far outweighs the public or governmental interest, the same cannot be said for every homicide that has occurred or will occur in the state. The potential lasting consequences of this substantial exemption from the FOIA should not be overlooked or minimized in the face of today’s tragedy.

SB1149 is also problematic in that it extends to recordings of emergency calls. While there is some precedent for restricting access to gruesome photos and video after a tragedy, this is far more limited with respect to audio recordings. Recordings have been made available to the public after many of our nation’s tragic shootings, including the recordings from the first responders to Aurora, 911 calls and surveillance video footage from Columbine, as well as 911 calls from the Hartford Distributors and Trayvon Martin shootings. While a compromise was reached in permitting the general release of these recordings, the bill includes a provision that prevents disclosure of audio segments describing the victim’s condition. Although there is a stronger interest in limiting access to the full descriptions of the child victims at Sandy Hook, weighing in favor of nondisclosure in that limited circumstance, emergency response recordings should be released in their entirety in the majority of homicide cases.

This aspect of the law in particular may have grave consequences for the future of the state’s transparency. Records of emergency calls traditionally become public records and are used by the media and ordinary citizens alike to evaluate law enforcement and their response to emergencies. The condition of the victim is an essential element of evaluating law enforcement response. As the president of the Society of Professional Journalists, Sonny Albarado, noted, “If you hide away documents from the public, then the public has no way of knowing whether police…have done their jobs correctly.” In other words, these calls serve as an essential check on government. As a nation which strives for an informed and engaged citizenry, making otherwise public records unavailable is rarely a good thing and should be done with more public discussion and caution than recently afforded by Connecticut’s legislature.

Connecticut’s bill demonstrates a frightening trend away from access and transparency. Colleen Murphy, the executive director of the Connecticut Freedom of Information Commission, has observed a gradual change in “toward more people asking questions about why should the public have access to information instead of why shouldn’t they.” It has never been easy to balance privacy rights with the freedom of information, and this is undoubtedly more difficult in today’s digital age where materials uploaded to the Internet exist forever. Still, our commitment to self-regulation, progress, and the First Amendment weighs in favor of disclosure. Exceptions should be limited to circumstances, like the Newtown shooting, where the privacy interest strongly outweighs the public’s interest in accessing information. As the Connecticut Council on Freedom of Information wrote in a letter to Governor Malloy, “History has demonstrated repeatedly that governments must favor disclosure. Only an informed society can make informed judgments on issues of great moment.”

Kristin Bergman is an intern at the Digital Media Law Project and a rising 3L at William & Mary Law School. Republished from the Digital Media Law Project blog.

Photo of Connecticut state capitol by Jimmy Emerson used under a Creative Commons license.

August 01 2012

13:46

Can Google Maps + Fusion Tables Beat OpenBlock?

WRAL.com, North Carolina's most widely read online news site, recently published a tool that allows you to search concealed weapons permits down to the street level. It didn't use OpenBlock to do so. Why?

openblock-logo.png

Or, if you're like many journalistically and technically savvy people I've spoken over the last few months, you could ask why would they? There's plenty of evidence out there to suggest the OpenBlock application is essentially a great experiment and proof of concept, but a dud as a useful tool for journalists. Many of the public records portions of Everyblock.com -- OpenBlock's commercial iteration -- are months if not years out of date. It can't be found anywhere on the public sites of the two news organizations in which the Knight Foundation invested $223,625. There are only three sites running the open-source code -- two of those are at universities and only one of which was created without funding from the Knight Foundation.

And, you, Thornburg. You don't have a site up and running yet, either.

All excellent points, dear friends. OpenBlock has its problems -- it doesn't work well in multi-city installations, some search functions don't work as you'd expect, there's no easy way to correct incorrect geocoding or even identify possible failures, among other obstacles that I'll describe in greater detail in a later blog post. But the alternatives also have shortcomings. And deciding whether to use OpenBlock depends on which shortcomings will be more tolerable to your journalists, advertisers and readers.

SHOULD I USE OPENBLOCK?

If you want to publish news from multiple cities or from unincorporated areas, or if you serve a rural community I'd hold off for now. If you visit our public repositories on GitHub you can see the good work the developers at Caktus have been doing to remove these limitations, and I'm proud to say that we have a private staging site that's up and running for our initial partner site. But until we make the set-up process easier, you're going to have to hire a Django developer (at anywhere from $48,000 a year to $150 an hour) to customize the site with your logo, your geographic data, and your news items.

The other limitation to OpenBlock right now is that it isn't going to be cheap to maintain once you do get it up and running. The next priority for me is to make the application scale better to multiple installations and therefore lower the maintenance costs. Within the small OpenBlock community, there's debate about how large of a server it requires. The very good developers at OpenPlans who did a lot of heavy lifting on the code between the time it was open sourced and the time that it should run nicely on a "micro" instance of Amazon's EC2 cloud hosting service -- about $180 a year.

But we and Tim Shedor, the University of Kansas student who built LarryvilleKU, find OpenBlock a little too memory intensive for the "micro" instance. We're on an Amazon Web Services "small" instance, and LarryvilleKU is on a similar sized virtual server at MediaTemple. That costs more like $720 a year. And if you add a staging server to make sure your code changes break in private instead of public, you're looking at hosting costs of nearly $1,500 a year.

And that's before your scrapers start breaking. Depending on how conservative you are, you'll want to set aside a budget for fixing each scraper somewhere between one and three times a year. Each fix might be an hour or maybe up to 12 hours of work for a Python programmer (or the good folks at ScraperWiki). If you have three data sources -- arrests, restaurant inspections and home sales, let's say -- then you may get away with a $300 annual scraper maintenance cost, or it may set you back as much as $15,000 a year.

I've got some ideas on how to reduce those scraper costs, too, but more on that later as well.

Of course, if you have someone on staff who does Python programming and whose done some work with public records and published a few Django sites and they've got time to spare, then your costs will go down significantly.

But just in case you don't have such a person on staff or aren't ready to make this kind of investment, what are your alternatives?

GOOGLE MAPS AND FUSION TABLES

Using a Google Map on your news website is a little like playing the saxophone. It's probably the easiest instrument to learn how to play poorly, but pretty difficult to make it really sing. Anyone can create a Google Map of homicides or parking garages or whatever, but it's going to be a static map of only one schema, and it won't be searchable or sortable.

Google_maps_screenshot.png

On the other hand, you can also use Google Maps and Fusion Tables to build some really amazing applications, like the ones you might see in The Guardian or on The Texas Tribune or WNYC or The Bay Citizen. You can do all this, but it also takes some coding effort and probably a bit more regular hand care and feeding to keep the site up-to-date.

I've taken a look at how you might use Google's data tools to replicate something like OpenBlock, although I've not actually done it. If you want to give it a whirl and report back, here's my recipe.

A RECIPE FOR REPLICATING OPENBLOCK

Step 1. Create one Google Docs spreadsheet for each schema, up to a maximum of four spreadsheets. And create one Google Fusion Table for each scheme, up to a maximum of four tables.

Step 2. If the data you want is in a CSV file that's been published to the web, you can populate it with a Google Docs function called ImportData. This function -- as well as its sister functions ImportHTML and ImportXML -- will only update 50 records a time. And I believe this function will pull in new data from the CSV about once an hour. I don't know whether it will append the new rows or overwrite them, or what it would do if only a few of the fields in a record change. If you're really lucky, the data would be in an RSS feed and you could use the ImportFeed function to get past this 50-record limit.

Of course, in the real world almost none of your data will be in these formats. None of mine are. And in that case, you'd have to either re-enter the data into Google Docs by hand or use something like ScraperWiki to scrape a datasource and present it as a CSV or a feed.

Step 3. Use a modification of this script to automatically pull the data -- including updates -- from the Google Docs spreadsheet into the corresponding Fusion table you created for that schema.

Step 4. Find the U.S. Census or local county shapefiles for any geographies you want -- such as ZIP codes or cities or school districts -- and convert them to KML.

Step 5. Upload that geographic information into another Fusion Table.

Step 6. Merge the the Fusion table from Step 3 with the Fusion table from Step 5.

Step 7. This is really a thousand little steps, each depending on which of OpenBlock's user interface features you'd like to replicate. And, really, it should be preceded by step 6a -- learn JavaScript, SQL, CSS and HTML. Once you've done that, you can build tools so that users can:

And there's even at least one prototype of using server-side scripting and Google's APIs to build a relatively full-functioning GIS-type web application: https://github.com/odi86/GFTPrototype

After all that, you will have some of the features of OpenBlock, but not others.

Some key OpenBlock features you can replicate with Google Maps and Fusion Tables:

  • Filter by date, street, city, ZIP code or any other field you choose. Fusion Tables is actually a much better interface for searching and filtering -- or doing any kind of reporting work -- than OpenBlock.
  • Show up to four different kinds of news items on one map (five if you don't include a geography layer).
  • Conduct proximity searches. "Show me crimes reported within 1 mile of a specific address."

WHAT YOU CAN'T REPLICATE

The OpenBlock features you can't replicate with Google:

  • Use a data source that is anything other than an RSS feed, HTML table, CSV or TSV. That's right, no XLS files unless you manually import them.
  • Use a data source for which you need to combine two CSV files before import. This is the case with our property transactions and restaurant inspections.
  • Update more than 50 records at a time. Definitely a problem for police reports in all but the smallest towns.
  • Use a data source that doesn't store the entire address in a single field. That's a problem for all the records with which we're working.
  • Map more than 100,000 rows in any one Fusion table. In rural counties, this probably wouldn't be a concern. In Columbus County, N.C., there are only 45,000 parcels of land and 9,000 incidents and arrests a year.
  • Use data sources that are larger than 20MB or 400,000 cells. I don't anticipate this would be a problem for any dataset in any county we're working.
  • Plot more than 2,500 records a day on a map. Don't anticipate hitting this limit either, especially after the initial upload of data.
  • Parse text for an address -- so you can't map news articles, for example.
  • Filter to the block level. If Main Street runs for miles through several miles, you're not going to be able to narrow your search to anything relevant.
  • Create a custom RSS feed, or email alert.

THE SEO ADVANTAGE

And there's one final feature of OpenBlock that you can't replicate using Google tools without investing a good deal of manual, rote set-up work -- taking advantage of SEO or social media sharing by having a unique URL for a particular geography or news item type. Ideally, if someone searches for "home sales in 27514" I want them to come to my site. And if someone wants to post to Facebook a link to a particular restaurant that was scolded for having an employee with a finger-licking tendency (true story), I'd want them to be able to link directly to that specific inspection incident without forcing their friends to hunt through a bunch of irrelevant 100 scores.

To replicate OpenBlock's URL structure using Google Maps and Fusion Tables, you'd have to create a unique web page and a unique Google map for each city and ZIP code. The geography pages would display a polygon of the selected geography, whether it's a ZIP code or city or anything else, and all of the news items for that geography (up to four schemas, such as arrests, incidents, property sales, and restaurant inspections). That's 55 map pages.

Then you'd have to create a map and a page for each news item type. That's four pages, four Fusion tables, and four Google Docs spreadsheets.

Whew. I'm going to stick with our work in improving the flexibility and scalability of OpenBlock. But it's still worth looking at Google Maps and Fusion Tables for some small and static data use cases. Other tools such as Socrata's Open Data, Caspio and Tableau Public are also worth your time as you begin to think about publishing public data. Each of those have some maintenance costs and their own strengths and weaknesses, but the real trick for using all of these tools is public data that isn't in any usable format. We're looking hard at solving that problem with a combination of scraping and crowdsourcing, and I'll report what we've found in an upcoming post.

Ryan Thornburg researches and teaches online news writing, editing, producing and reporting as an assistant professor in the School of Journalism and Mass Communication at the University of North Carolina at Chapel Hill. He has helped news organizations on four continents develop digital editorial products and use new media to hold powerful people accountable, shine light in dark places and explain a complex world. Previously, Thornburg was managing editor of USNews.com, managing editor for Congressional Quarterly's website and national/international editor for washingtonpost.com. He has a master's degree from George Washington University's Graduate School of Political Management and a bachelor's from the University of North Carolina at Chapel Hill.

June 27 2011

17:30

Deeper into data: U.K.-based ScraperWiki plans new tools and U.S. expansion with News Challenge

Looking over the scope of the Knight News Challenge, from its beginning to the winners announced this year, it’s clear data is king. From this year’s data-mining projects alone — whether inside the confines of the newsroom, posting public records in rural areas, or delivering vital information on clean water — we can safely say that the Knight Foundation sees data as a big part of the future for journalism and communities.

But Francis Irving says we’ve only scratched the surface on how data is delivered, displayed and consumed. “It’s an unexplored area,” said Irving, CEO of ScraperWiki. “We’re right at the beginning of this.”

As you may have guessed from the name, ScraperWiki is a tool to help people collect and publish data through a simple online interface that also serves as a repository for code. (Equal dose scraper and wiki.)

As a winner of this year’s Knight News Challenge, ScraperWiki plans to use their three-year, $280,000 grant to expand both their product and their reach. With a home base in Liverpool, ScraperWiki also hopes to cross the Atlantic and replicate their work helping journalists and ordinary citizens uncover data. “We want to lower the barrier for someone to do general purpose programming,” he said.

Irving told me a number of reporters and programmers in the U.K. have teamed up to use ScraperWiki to find stories and give new life to old data. James Ball, an investigative reporter for The Guardian, used ScraperWiki to write a story on the influence and spending of lobbyist on members of Parliament. ScraperWiki was also used by U.K. officials to create a search site for services provided by the government.

One of the reasons for ScraperWiki’s success, Irving said, is the meetups they throw to bring journalists and programmers face to face. Part of their expansion plans under the News Challenge grant is launching similar, Hacks/Hackers-style events here in the U.S., which will also serve as an introduction to ScraperWiki. Irving said the meetups are less about serving up punch and pie, but instead a way of fostering the kind of talk that happens when you bring different perspectives to talk about a shared interest.

“The value is in gathering, structuring, and building things people haven’t thought of yet,” he said.

More broadly, they plan to build out a new set of features for ScraperWiki, including an embargo tool that would allow journalists to create structured datasets that would be released on publication of a story; an on-demand tool for a seamless process for finding and releasing records; and alerts which could signal journalists on changes related to databases they follow.

And that gets to Irving’s larger hopes for uses of data, either in storytelling or surfacing vital information for the public’s use. Data journalism, he said, can serve a great purpose, but has to expand beyond simply accessing and assessing government records for stories. That’s why Irving is interested in the new generation of news apps that step outside of the article or investigative series, that take a different approach to visualization and display.

Irving said they’re happy to have a partner like Knight who has knowledge and connections in the world of journalism, which will be a help when ScraperWiki comes to these shores. The key ingredient, he said, is partnering the creative expertise of programmers, who can see new angles for code, and journalists, who can curate what’s important to the community.

“There’s going to be lots of things happening when you combine professional journalists with computer programs and they can supercharge each other,” he said.

March 01 2011

15:07

DocumentCloud Passes Major Milestone: 1 Million Pages Uploaded

DocumentCloud's Jeremy Ashkenas collaborated on this post.

It has been less than a year since DocumentCloud began adding users to our beta. Late Monday morning, a user uploaded our millionth page of primary source documents.

The thousands of documents in our catalog have arrived in small batches: five pages here, twenty there. The vast majority of the 65,000 documents that those million pages comprise remain private, but we're fast closing in on 10,000 public documents in our catalog.

Broad Appeal

Journalists are using DocumentCloud to publish all sorts of documents, including these:

Remaking History

Documents in our catalog reach back into the past, as well. In 1970 Ruben Salazar was killed by police while covering an anti-war protest in east Los Angeles. A story rife with controversy, questions, and suspicions, his death became a rallying point in the Mexican American civil rights movement. Forty years later -- after refusing a public records request for documents that might shed some light on the circumstances of his death -- the Los Angeles County Sheriff's Department agreed to turn the files over to the Office of Independent Review.

While Los Angeles Times reporters waited for the report, they assembled their own folio of early clippings on Ruben Salazar. Readers can review FBI files obtained by the Times in 1999 and LAPD records on the department's repeated clashes with the journalist as well as a draft of the report prepared by the Office of Independent Review.

Join the Cloud

You can browse recently published documents by searching for "filter: published" or read up on other searches you might want to run. Here's hoping that the next year brings millions more pages, and more great document-driven reporting.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl