Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

August 09 2012

12:19

Two reasons why every journalist should know about scraping (cross-posted)

This was originally published on Journalism.co.uk – cross-posted here for convenience.

Journalists rely on two sources of competitive advantage: being able to work faster than others, and being able to get more information than others. For both of these reasons, I  love scraping: it is both a great time-saver, and a great source of stories no one else has.

Scraping is, simply, getting a computer to capture information from online sources. They might be a collection of webpages, or even just one. They might be spreadsheets or documents which would otherwise take hours to sift through. In some cases, it might even be information on your own newspaper website (I know of at least one journalist who has resorted to this as the quickest way of getting information that the newspaper has compiled).

In May, for example, I scraped over 6,000 nomination stories from the official Olympic torch relay website. It allowed me to quickly find both local feelgood stories and rather less positive national angles. Continuing to scrape also led me to a number of stories which were being hidden, while having the dataset to hand meant I could instantly pull together the picture of a single day on which one unsuccessful nominee would have run, and I could test the promises made by organisers.

ProPublica scraped payments to doctors by pharma companies; the Ottawa Citizen ran stories based on its scrape of health inspection reports. In Tampa Bay they run an automatically updated page on mugshots. And it’s not just about the stories: last month local reporter David Elks was using Google spreadsheets to compile a table from a Word document of turbine applications for a story which, he says, “helped save the journalist probably four or five hours of manual cutting and pasting.”

The problem is that most people imagine that you need to learn a programming language to start scraping - but that’s not true. It can help - especially if the problem is complicated. But for simple scrapers, something as easy as Google Docs will work just fine.

I tried an experiment with this recently at the News:Rewired conference. With just 20 minutes to introduce a room full of journalists to the complexities of scraping, and get them producing instant results, I used some simple Google Docs functions. Incredibly, it worked: by the end The Independent’s Jack Riley was already scraping headlines (the same process is outlined in the sample chapter from Scraping for Journalists).

And Google Docs isn’t the only tool. Outwit Hub is a must-have Firefox plugin which can scrape through thousands of pages of tables, and even Google Refine can grab webpages too. Database scraping tool Needlebase was recently bought by Google, too, while Datatracker is set to launch in an attempt to grab its former users. Here are some more.

What’s great about these simple techniques, however, is that they can also introduce you to concepts which come into play with faster and more powerfulscraping tools like Scraperwiki. Once you’ve become comfortable with Google spreadsheet functions (if you’ve ever used =SUM in a spreadsheet, you’ve used a function) then you can start to understand how functions work in a programming language like Python. Once you’ve identified the structure of some data on a page so that Outwit Hub could scrape it, you can start to understand how to do the same in Scraperwiki. Once you’ve adapted someone else’s Google Docs spreadsheet formula, then you can adapt someone else’s scraper.

I’m saying all this because I wrote a book about it. But, honestly, I wrote a book about this so that I could say it: if you’ve ever struggled with scraping or programming, and given up on it because you didn’t get results quickly enough, try again. Scraping is faster than FOI, can provide more detailed and structured results than a PR request – and allows you to grab data that organisations would rather you didn’t have. If information is a journalist’s lifeblood, then scraping is becoming an increasingly key tool to get the answers that a journalist needs, not just the story that someone else wants to tell.

12:19

Two reasons why every journalist should know about scraping (cross-posted)

This was originally published on Journalism.co.uk – cross-posted here for convenience.

Journalists rely on two sources of competitive advantage: being able to work faster than others, and being able to get more information than others. For both of these reasons, I  love scraping: it is both a great time-saver, and a great source of stories no one else has.

Scraping is, simply, getting a computer to capture information from online sources. They might be a collection of webpages, or even just one. They might be spreadsheets or documents which would otherwise take hours to sift through. In some cases, it might even be information on your own newspaper website (I know of at least one journalist who has resorted to this as the quickest way of getting information that the newspaper has compiled).

In May, for example, I scraped over 6,000 nomination stories from the official Olympic torch relay website. It allowed me to quickly find both local feelgood stories and rather less positive national angles. Continuing to scrape also led me to a number of stories which were being hidden, while having the dataset to hand meant I could instantly pull together the picture of a single day on which one unsuccessful nominee would have run, and I could test the promises made by organisers.

ProPublica scraped payments to doctors by pharma companies; the Ottawa Citizen ran stories based on its scrape of health inspection reports. In Tampa Bay they run an automatically updated page on mugshots. And it’s not just about the stories: last month local reporter David Elks was using Google spreadsheets to compile a table from a Word document of turbine applications for a story which, he says, “helped save the journalist probably four or five hours of manual cutting and pasting.”

The problem is that most people imagine that you need to learn a programming language to start scraping - but that’s not true. It can help - especially if the problem is complicated. But for simple scrapers, something as easy as Google Docs will work just fine.

I tried an experiment with this recently at the News:Rewired conference. With just 20 minutes to introduce a room full of journalists to the complexities of scraping, and get them producing instant results, I used some simple Google Docs functions. Incredibly, it worked: by the end The Independent’s Jack Riley was already scraping headlines (the same process is outlined in the sample chapter from Scraping for Journalists).

And Google Docs isn’t the only tool. Outwit Hub is a must-have Firefox plugin which can scrape through thousands of pages of tables, and even Google Refine can grab webpages too. Database scraping tool Needlebase was recently bought by Google, too, while Datatracker is set to launch in an attempt to grab its former users. Here are some more.

What’s great about these simple techniques, however, is that they can also introduce you to concepts which come into play with faster and more powerfulscraping tools like Scraperwiki. Once you’ve become comfortable with Google spreadsheet functions (if you’ve ever used =SUM in a spreadsheet, you’ve used a function) then you can start to understand how functions work in a programming language like Python. Once you’ve identified the structure of some data on a page so that Outwit Hub could scrape it, you can start to understand how to do the same in Scraperwiki. Once you’ve adapted someone else’s Google Docs spreadsheet formula, then you can adapt someone else’s scraper.

I’m saying all this because I wrote a book about it. But, honestly, I wrote a book about this so that I could say it: if you’ve ever struggled with scraping or programming, and given up on it because you didn’t get results quickly enough, try again. Scraping is faster than FOI, can provide more detailed and structured results than a PR request – and allows you to grab data that organisations would rather you didn’t have. If information is a journalist’s lifeblood, then scraping is becoming an increasingly key tool to get the answers that a journalist needs, not just the story that someone else wants to tell.

December 08 2011

09:22

4 ways to publish your data online

I’ve written a post on the Help Me Investigate blog on a number of different ways to publish data online, from converting Excel spreadsheets into HTML tables, to using Google Docs, or using data-sharing platforms like BuzzData. You may find it useful.

July 29 2011

08:24

SFTW: How to scrape webpages and ask questions with Google Docs and =importXML

XML puzzle cube

Image by dullhunk on Flickr

Here’s another Something for the Weekend post. Last week I wrote a post on how to use the =importFeed formula in Google Docs spreadsheets to pull an RSS feed (or part of one) into a spreadsheet, and split it into columns. Another formula which performs a similar function more powerfully is =importXML.

There are at least 2 distinct journalistic uses for =importXML:

  1. You have found information that is only available in XML format and need to put it into a standard spreadsheet to interrogate it or combine it with other data.
  2. You want to extract some information from a webpage – perhaps on a regular basis – and put that in a structured format (a spreadsheet) so you can more easily ask questions of it.

The first task is the easiest, so I’ll explain how to do that in this post. I’ll use a separate post to explain the latter.

Converting an XML feed into a table

If you have some information in XML format it helps if you have some understanding of how XML is structured. A backgrounder on how to understand XML is covered in this post explaining XML for journalists.

It also helps if you are using a browser which is good at displaying XML pages: Chrome, for example, not only staggers and indents different pieces of information, but also allows you to expand or collapse parts of that, and colours elements, values and attributes (which we’ll come on to below) differently.

Say, for example, you wanted a spreadsheet of UK council data, including latitude, longitude, CIPFA code, and so on – and you found the data, but it was in XML format at a page like this:  http://openlylocal.com/councils/all.xml

To pull that into a neatly structured spreadsheet in Google Docs, type the following into the cell where you want the import to begin (try typing in cell A2, leaving the first row free for you to add column headers):

=ImportXML(“http://openlylocal.com/councils/all.xml”, ”//council”)

The formula (or, more accurately, function) needs two pieces of information, which are contained in the parentheses and separated by a comma: a web address (URL), and a query. Or, put another way:

=importXML(“theURLinQuotationMarks”, “theBitWithinTheURLthatYouWant”)

The URL is relatively easy – it is the address of the XML file you are reading (it should end in .xml). The query needs some further explanation.

The query tells Google Docs which bit of the XML you want to pull out. It uses a language called XPath – but don’t worry, you will only need to note down a few queries for most purposes.

Here’s an example of part of that XML file shown in the Chrome browser:

XML from OpenlyLocal

The indentation and triangles indicate the way the data is structured. So, the <councils> tag contains at least one item called <council> (if you scrolled down, or clicked on the triangle to collapse <council> you would see there are a few hundred).

And each <council> contains an <address>, <authority-type>, and many other pieces of information.

If you wanted to grab every <council> from this XML file, then, you use the query “//council” as shown above. Think of the // as a replacement for the < in a tag – you are saying: ‘grab the contents of every item that begins <council>’.

You’ll notice that in your spreadsheet where you have typed the formula above, it gathers the contents (called a value) of each tag within <council>, each tag’s value going into their own column – giving you dozens of columns.

You can continue this logic to look for tags within tags. For example, if you wanted to grab the <name> value from within each <council> tag, you could use:

=ImportXML(“http://openlylocal.com/councils/all.xml”, ”//council//name”)

You would then only have one column, containing the names of all the councils – if that’s all you wanted. You could of course adapt the formula again in cell B2 to pull another piece of information. However, you may end up with a mismatch of data where that information is missing – so it’s always better to grab all the XML once, then clean it up on a copy.

If the XML is more complex then you can ask more complex questions – which I’ll cover in the second part of this post. You can also put the URL and/or query in other cells to simplify matters, e.g.

=ImportXML(A1, B1)

Where cell A1 contains http://openlylocal.com/councils/all.xml and B1 contains //council (note the lack of quotation marks). You then only need to change the contents of A1 or B1 to change the results, rather than having to edit the formula directly)

If you’ve any other examples, ideas or corrections, let me know. Meanwhile, I’ve published an example spreadsheet demonstrating all the above techniques here.

PrintFriendly

July 20 2011

14:42

How to collaborate (or crowdsource) by combining Delicious and Google Docs

RSS girl by Heather Weaver

RSS girl by HeatherWeaver on Flickr

During some training in open data I was doing recently, I ended up explaining (it’s a long story) how to pull a feed from Delicious into a Google Docs spreadsheet. I promised I would put it down online, so: here it is.

In a Google Docs spreadsheet the formula =importfeed will pull information from an RSS feed and put it into that spreadsheet. Titles, links, datestamps and other parts of the feed will each be separated into their own columns.

When combined with Delicious, this can be a useful way to collect together pages that have been bookmarked by a group of people, or any other feed that you want to analyse.

Here’s how you do it:

1. Decide on your tag, network or user

The spreadsheet will pull data from an RSS feed. Delicious provides so many of these that you are spoilt for choice. Here are the main three:

A tag

Used by various people.

Advantages: quick startup – all you need to do is tell people the tag (make sure this is unique, such as ‘unguessable2012′).

Disadvantages: others can hijack the tag – although this can be cleaned from the resulting data.

A network

Consisting of the group of people who are bookmarking:

Advantages: group cannot be infiltrated.

Disadvantages: setup time – may need to create a new account to build the network around.

A user

Created for this purpose:

Advantages: if users are not confident in using Delicious, this can be a useful workaround.

Disadvantages: longer set up time – you’ll need to create a new account, and work out an easy way for it to automatically capture bookmarks from the group. One way is to pull an RSS feed of any mentions on Twitter and use Twitterfeed to auto-tweet them with a hashtag, and then Packrati.us to auto-bookmark all tweeted links (a similar process is detailed here).

The RSS feed for each will be found at the bottom of pages, and is consistently formatted like so:

Delicious.com/tag/unguessable2012

Delicious.com/network/unguessable2012

Delicious.com/unguessable2012

2. Create your spreadsheet

In Google Docs, create a new spreadsheet and in the first cell type the following formula:

=importfeed(“

…adding your RSS feed after the quotation mark, and then this at the end:

“)

So it looks something like this:

=importfeed(“http://feeds.delicious.com/v2/rss/tag/unguessable2012?count=15″)

Now press enter and after a moment the spreadsheet should populate with data from that feed.

You’ll note, however, that at most you will have only 15 rows of data here. That’s because the RSS feed you’ve copied includes that limitation.

If you look at the RSS feed you’ll see an easy clue on how to change this…

So, try editing it so that the count=15 part of that URL reads count=20 instead. You can put a higher number – but Google Docs will limit results to 20 at a time.

3. Collecting contributions

Technically, you’re now all set up. The bigger challenge is, of course, in getting people to contribute. It helps if they can see the results – so think about publishing your spreadsheet.

You’ll also need to make sure that you check it regularly and copy into a backup spreadsheet so you don’t miss results after that top 20.

But if you find it doesn’t work it may be worth thinking of other ways of doing this – for example, with a Google Form, or using =importfeed with the RSS feed for a search on results for a Twitter hashtag containing links (Twitter’s advanced search allows you to limit results accordingly – and all search results come with an RSS feed link like this one)

Of course there are far more powerful ways of doing this which are worth exploring once you’ve understood the basic possibilities.

Doing more with =importfeed

The =importfeed formula has some other elements that we haven’t used.

Another way to do this, for example, is to paste your RSS feed URL into cell A1 and type the following anywhere else:

=importfeed(A1, ”Items Title”, FALSE, 20)

This has 4 parts in the parentheses:

  1. A1 – this points at the URL you just pasted in cell A1, and means that you only have to change what’s in A1 to change the feed being grabbed, rather than having to edit the formula itself
  2. “Items Title” – this is the part of the feed that is being grabbed. If you look in the feed you will see a part that says <item> and within that, an element called <title> – that’s it. You could change this to “Items URL” to get the <URL> part of <title> instead, for example. Or you could just put “Items” and get all 5 parts of each item (title, author, URL, date created, and summary). You can also use “feed” to get information about the feed itself, or “feed URL” or “feed title” or “feed description” to get that single piece of information.
  3. FALSE – this just says whether you want a header row or not. Setting to TRUE will add an extra row saying ‘Title’, for example.
  4. 20 – the number of results you want.

You can see an example spreadsheet with 3 sheets demonstrating different uses of this formula here.

PrintFriendly

June 19 2011

05:05

Not The Guardian - Web-first workflow with Google Docs, WordPress and InDesign integration? ... For free

Mediabistro :: The Bangor Daily News announced this week that it completed its full transition to open source blogging software, WordPress. And get this: The workflow integrates seamlessly with InDesign, meaning the paper now has one content management system for both its web and print operations. And if you’re auspicious enough, you can do it too — he’s open-sourced all the code!

Continue to read Lauren Rabaino, www.mediabistro.com

Docs to WordPress to InDesign, video William P.D. , www.screenr.com

May 30 2011

15:21

Tools for journalists - F-Secure found phishing sites hosted on Google Docs

ReadWriteWeb :: The security researchers at F-Secure have discovered several phishing (a way of attempting to acquire sensitive information) sites hosted on Google Docs, Google's online office suite. This is not an uncommon occurrence, it seems. According to a new blog post on the security firm's site, the team says "we regularly see phishing sites via Google Docs spreadsheets and hosted on spreadsheets.google.com."

Continue to read Sarah Perez, www.readwriteweb.com

F-Secure blog entry www.f-secure.com

March 29 2011

18:00

How Project Argo Members Communicate Across Time Zones

Project Argo is an ambitious undertaking. It involves networking NPR with 12 member stations spanning three time zones with a different mix of bloggers and editors at each station. The stations cover a variety of regionally focused, nationally resonant topics that range from climate change to local music.

Communicating effectively within these parameters has required creativity and experimentation. And we're still learning.

I'll break down our various approaches -- what we've tried, what's working, and what we're still working on -- using the three tiers of communication: One-to-one, one-to-many, and many-to-many.

One-to-one communication

These exchanges with the stations have offered some of the most intensive and valuable interactions of the project. When we started, much of our communication happened through the typical channels -- lengthy, one-on-one phone calls and emails to brainstorm, strategize, give feedback, and train.

Email has a tendency to be high friction. Messages can take a long time to compose and a long time to digest, and much is often lost on both ends of the process.

But of course email still has its advantages. It's asynchronous -- in other words, you can carry on a thread without needing to be on the same schedule. It's great for laying information out in precise detail, whether you're talking about metrics or line-editing posts. And it's invaluable for documenting your communication and finding it later.

For working remotely, there's nothing like a phone call or Skype session to have a good back-and-forth conversation. There are drawbacks here too, of course: Calls longer than 10 minutes need to be scheduled, and lots of good information can escape without being documented.

Lately, I've taken to augmenting phone conversations with a PiratePad to help with that last problem. Like Google Docs, PiratePad allows two or more people to see what one another are writing in real-time. The difference is that PiratePad shows your document-mate's typing character by character, rather than refreshing at regular intervals, so it's a little more immediate. This combo has been excellent.

As the project has evolved, most of our one-on-one contact has become pretty quick and spontaneous. Twitter has proven to be one of the best tools for communicating one-on-one. Since all the bloggers are on Twitter, a quick DM conversation often suffices to get across what we need to convey or inquire about.

Heather Goldstone, who blogs for WGBH at Climatide, said Twitter was her favorite tool for staying in touch. "Just in general, I've gotten really hooked on Twitter," she said. "It's more like texting instead of email. If the other person's around, it's got a faster turnaround time and more of a conversational feel than email."

Despite the surfeit of tools to choose from, however, the most valuable one-on-one interactions we can have are in person. For as much as we can do by email, over the phone, through Twitter and other means, nothing replaces being able to sit down face-to-face with our station colleagues, or being able to peer over their shoulders as they're working on their Argo sites. Of course, this is the most time- and resource-intensive way to communicate. But there's still nothing like it.

One-to-Many Communication

We occasionally need to broadcast messages to all the stations involved in the project. For that, we mainly use Basecamp, which gives us a good common archive of files and messages, and integrates pretty well with everyone's email. The biggest problem with Basecamp is that all replies to a message are sent to everyone who received the original message. This can create quite a cascade of emails when a lot of folks weigh in on a thread.

We regularly lead webinars for the Argo bloggers, and we've tried a variety of approaches to doing this. We started out setting these up through a common, organization-wide GoToMeeting account, but this required quite a bit of advance set-up and coordination, and one of the participants invariably had technical troubles. Plus, we've had difficulty recording the webinars. (GoToMeeting's recording technology only works on PCs; my teammates and I use Macs. Plus, the GoToMeeting software tends to conflict with screencasting tools we might use to record the desktop and audio.)

We've since moved to a lower-fidelity approach, using free tools. Join.Me to share desktops, and FreeConference.com for voice communication.

The voice controls in FreeConference.com's system are reasonably robust. Call organizers can mute everyone but the presenter, allowing call attendees to un-mute themselves selectively. For a small fee, FreeConference.com allows us to record the audio when we need to. Pair that audio up with video of the related slides, and you've got a webinar recording.

When our goal is capturing best practices all the stations can replicate, or documenting instructions on using various aspects of the Argo platform, we turn to our two public-facing communication channels: the Argo blog and the Argo documentation site.

Like everything else, these communication platforms pose their own disadvantages. It can be time-consuming to write up or record material for these sites. Also, the more material that's there, the harder it can be for the stations to find what they need when they have questions.

We created an FAQ on the documentation site to help the stations find answers to the most common questions. And the time invested in producing the documentation and material up-front often saves us time down the road when we can send a link to a post we've made in response to a question from one of the Argo-bloggers.

Many-to-Many Communication

We've consistently found that some of the most valuable communication around the project happens when folks at each station can talk with one another. Yet because of the geographical and topical dispersion of the stations, these can be the hardest interactions to foster. So we continue to seek ways to encourage this, using all of the tools mentioned in this piece.

Webinars offer a regular opportunity for folks at the stations to share lessons about a focused aspect of developing a niche site. Increasingly, we've sought to foster more open-ended conversations among the stations as well -- including regular story calls where a subset of the bloggers share what they're working on, spontaneous brainstorming calls, and check-in conference calls where we discuss how the project is going.

Right now, requests for technical help from bloggers at the stations tend to fall into one of three categories: bug reports (this should work, but doesn't), feature requests (I'd like to be able to do this on the site), and requests for advice (how can I accomplish this in a post?). It's impossible for the bloggers, who don't know the details of how the software works, to determine which is which.

So it would be helpful for us to route all these reports to a common channel, accessible by all, where users can chime in if they're having similar problems or have advice to share on how to accomplish something. To that end, we're working on creating a Stack-Overflow-esque board that would allow the bloggers to discuss issues and solicit advice as a group without the reply-all problems Basecamp poses.

On a few occasions, we've been able to bring the stations together for some of that invaluable person-to-person contact. As Tom Paulson, who blogs for KPLU at Humanosphere, pointed out, in-person communication builds on all the other methods of sharing ideas.

Generally Speaking

For a project as variegated as Argo, there's no one-size-fits-all solution to keeping in touch. The project has unfolded in phases -- hiring reporters, training reporters, building audience, and sustaining growth -- at various rates for each station, and each of those phases has required a different approach to communication.

What's served us best are flexibility and adaptation. Setting up a phone call over Twitter while we trade notes in a PiratePad. Using Basecamp to agree on a time for a webinar that mashes up FreeConference.com with Join.Me.

Although I've mentioned specific tools in this post, I don't think the hodgepodge of software and services we use is the most important takeaway. Instead, my strongest recommendation is this: Be attentive to your communication needs and how well your approaches are serving them, then adjust continuously.

Matt Thompson is an editorial product manager at Project Argo.

February 24 2011

18:18

How to Integrate Social Tools into the Journalism Classroom







Education content on MediaShift is sponsored by the USC Annenberg nine-month M.A. in Specialized Journalism. USC's highly customized degree programs are tailored to the experienced journalist and gifted amateur. Learn more about how USC Annenberg is immersed in tomorrow.

It's difficult to deny that social media platforms are changing the face of modern communication. Online tools are a growing part of how news is sourced, published, and consumed. The revolutions in Tunisia and Egypt demonstrated the importance of social media literacy for journalists.

Yet integrating social media into university classrooms can be a daunting task for many journalism educators. Professors are typically required to use clunky online systems for grading and communicating with students. It's an unpleasant experience for everyone involved. These awkward systems don't inspire creativity, enrich collaboration, or instill a passion for experimentation -- all of which are required to survive and succeed in a rapidly changing media industry.

This post will examine a few innovative uses of social media that journalism professors are trying out in the classroom. Not every tool is appropriate for every class, but there are undoubtedly ways in which most instructors can find room for at least some of these ideas.

Facebook

Yes, Facebook can play a significant, positive role in the classroom. And no, professors don't have to become "friends" with their students to make use of it.

Facebook Groups provide a place where students can post ideas, links, and even photos or videos. When one uploads content to a Facebook group, neither the action nor the information shows up on a person's wall. It remains completely within the walls of the group.

Facebook Group Interaction

The main reason to use a Facebook group is that students are already there. They don't have to remember another log-in or remember to go visit "the class forum." It fits seamlessly into their lives. It takes very little effort to click "like" or add a comment to a classmate's idea. This fact alone encourages more interaction than other platforms.

Groups come in three varieties: open, closed or secret. "Open" groups are public, "closed" groups keep content private but allow others to see its list of members, and "secret" won't show up anywhere.

I recommend "closed" groups for classes. This gives students a private space to speak and makes it easy for others to join. Simply send a link to a closed group and others can request to join. (For more on Facebook Groups, read this excellent post by Jen Lee Reeves, who teaches at the University of Missouri.)

Facebook Pages are used by news organizations to share stories and even to find sources for stories. Journalism instructor Staci Baird has her students manage San Francisco Beat as part of the Digital News Gathering class at San Francisco State University.

"I want my students to get used to trying new things, thinking outside the box," she said. Other benefits Baird cited included "real-world experience" and thinking of Facebook in professional terms.

After you create a Page, you can add students as admins by entering their email addresses. This gets around having to add students as friends in order to invite them to participate.

Group Blogs

Blogs are a great way to expose students to online writing and basic web publishing. Students can post assignments for teachers to see, and the overall blog can contribute reporting to the local community.

Tumblr Screenshot

Tumblr has been in the spotlight recently for its rising popularity. It is elegant in its simplicity, standing somewhere between a Twitter feed and a WordPress blog.

Mashable community manager and social strategist Vadim Lavrusik uses Tumblr as the primary vehicle for the Social Media Skills for Journalists class he teaches at Columbia University.

"Because Tumblr is a social platform, other members of the community are able to follow and keep up," wrote Lavrusik in a recent post.

Each student has his or her own account and can contribute to a collaborative Tumblr that combines everyone's work.

Posterous is similar to Tumblr but has a few key differences. Its signature feature is the ability to post text, photos, or video by simply sending an email. Posterous also offers moderation and group blogs.

Educator Wesley Fryer posted a detailed screencast on setting up a moderated class blog.

Staci Baird also used Posterous for a mobile reporting class. She said some students were able to use smartphone apps while others could still post via email.

WordPress

WordPress is another free blogging platform. There are two ways to set up WordPress blogs. The simplest way is to create an account at WordPress.com. It's fast and free, but also limited in terms of customizing its look and features.

Through WordPress.org, the source code can be downloaded and installed on any independent web server. This opens the door to extensive customization. Because it's open source, it allows web developers to create a rich library of free plug-ins that enhance the core components. Journalism professor Robert Hernandez recommends the BuddyPress plug-in to add social and collaborative features. Plug-ins are not available for WordPress.com accounts.

Some universities may allow WordPress installations on campus servers, but others have more restrictive IT policies. In this case, teachers may need to pay for a domain name and web hosting to run an independent server. It typically costs around $10 per year to register a domain name; server space to host a blog costs around $5 a month.

Hernandez runs his class blogs from a personal web hosting account. Multimedia lecturer Jeremy Rue uses the WP Super Cache plug-in to optimize the server load for self-hosted WordPress blogs.

Social Curation

As newsmakers engage on Twitter and Facebook, it's important that students know how to collect and annotate these messages. Storify, Curated.by and Keepstream all allow users to gather and embed social media messages for use in blog posts and articles.

As I was gathering ideas for this article, I asked journalism educators on Twitter about their use of these tools in the classroom. I collected their responses using Storify.

Storify screenshotWhile Storify and Keepstream are designed around discrete collections of content, Curated.by is geared more toward ongoing curation.

For that reason, I suggest using Curated.by for student coverage of live events or for long-term collaboration. Another useful feature is that it allows multiple contributors to work together on the same collection.

Storify does allow users to share accounts as "editors," but I don't recommend this because it gives students full access to edit all content in each other's accounts. The privacy features in Curated.by allow users to limit access to specific projects.

Collaborative Writing

Google Docs allows multiple contributors to write at the same time and track revisions. This service is simple and popular.

But beyond Google Docs, a cluster of collaborative writing apps may have a more practical use in class. In addition to allowing multiple contributors, they record detailed keystrokes. This means you can replay the entire writing process.

As an example, I used iEtherPad to draft this article (you can watch me write it by pressing play). In math classes, students must show their work. Why not require students to show their writing? (For another example of collaborative writing on iEtherPad, check out this story by Mark Glaser on MediaShift.)

Mind Mapping

Mind mapping, or structured brainstorming, helps organize ideas based on their relationship to other elements. There are several free mind mapping applications, but one in particular offers a useful feature for the classroom: online collaboration. And like the collaborative writing applications, Mind Meister records all actions.

I used Mind Meister to begin a class on multimedia journalism. I asked students to define journalism, describe multimedia, and organize how each element related to the others. The mind map tracked the updates as we talked about various definitions. It was fun for them to interact with each other, and it kept them engaged from their workstations rather than watching me write on a whiteboard.

Experimentation

Journalism educators need to lead by example and experiment. It's OK to try something that doesn't work perfectly. No tool is perfect. In six months, the sites mentioned here will inevitably be upgraded with new features. What's important is inspiring students to apply their journalistic curiosity to exploring how new social tools can further their storytelling.

If you have experience using social services like these in the classroom, I hope you'll share your perspective in the comments.

Nathan Gibbs teaches multimedia journalism as an adjunct instructor for Point Loma Nazarene University and the SDSU Digital and Social Media Collaborative. Gibbs oversees multimedia content as web producer for KPBS, the PBS and NPR affiliate in San Diego. He played a key role in the station's groundbreaking use of social media during the 2007 Southern California wildfires and continues to drive interactive strategy. Gibbs is on Twitter as @nathangibbs and runs Modern Journalist, a blog for journalists exploring multimedia.







Education content on MediaShift is sponsored by the USC Annenberg nine-month M.A. in Specialized Journalism. USC's highly customized degree programs are tailored to the experienced journalist and gifted amateur. Learn more about how USC Annenberg is immersed in tomorrow.

This is a summary. Visit our site for the full post ».

November 09 2010

22:02

Inside the NewsHour's Multi-Platform Election Night Bedlam

Elections test how much information a news organization can process and then quickly and accurately share it with an audience. They're also a good time for news organizations to take stock of how far they've come since the last one, and to try the latest journalistic tools (or gimmicks).

Four years ago, YouTube was nascent and Facebook had finally opened up to everyone. By 2008, Twitter was taking off and web video was becoming more commonplace. This year, as Poynter noted, the iPad and live-streaming proved to be the 2010 election's focal points for journalism innovation, but the technology and implementation obviously have a ways to go.

At the PBS NewsHour, we'd already had plenty of time to experiment with the tools we implemented this Election Day, and things went rather well as a result. Below is a look at the different strategies and technologies we used in our election coverage last week, along with some observations about what did and didn't work.

Live-Stream at Center of Vote 2010 Plans

As was the case two years ago when the NewsHour's web and broadcast staffs were mostly separate operations, planning for 2010 Election Day coverage began months ago at the unified and rebranded PBS NewsHour.

Over the past year, the Haitian earthquake, the Foot Hood massacre and the Gulf oil spill taught staffers to operate in a more platform-neutral manner: Information is gathered and triaged to see what works best for web and broadcast audiences, and sometimes both. Vote 2010, however, was the first planned news event to truly test how our staff could concurrently serve our audiences on TV, mobile devices and on the web, as this video outlined:

We had a monumental TV task ahead this year because we were taping broadcasts at our regular time (6 p.m. ET) and adding 7 and 9 p.m. "turnarounds" for other time zones. As in past years, we opted to host a late-night election special to be fed to PBS stations. This year, the NewsHour started taping at 10 p.m., feeding the first hour exclusively to a livestream, then continuing at 11 p.m. both as a livestream and feed to stations.

We also put more effort than ever before into spreading the word about our free live-stream. As part of pre-election social media and PR outreach, we spent a few hundred bucks to sponsor an ONA DC Meetup to kick off the sold-out Online News Assocation conference. We publicized there and to our PBS colleagues that we were giving away our high-quality election night livestream.

Thanks to a combination of outreach to established partners and cold-calling other media and bloggers that might want an election video presence, we increased the reach of the NewsHour's live-stream by having it hosted elsewhere including local PBS stations, the Sunlight Foundation, AARP, Breitbart and Huffington Post.

We also hosted a map with live AP election data on our site and combined it with our map-centric Patchwork Nation collaboration. We used CoveritLive to power a live-blog of results, analysis and reports from the field. Extra, the NewsHour's site for students and teachers, solicited opinion pieces from students in Colorado, Wisconsin and Florida on topics ranging from why they back specific candidates, why young people should care about voting and whether young voters are informed enough to cast a ballot.

Collaboration via Google Docs

Thankfully, many of the tools we experimented with to cover the 2008 election -- Google Docs, Twitter, Facebook -- have since matured as newsroom resources. Except for a few momentary hiccups, Twitter was as stable as we could have hoped on Election Day.

newshour at desk.jpg

Two years ago, Google Docs had a clunkier feel. If two people were in the same document, both would have to click save repeatedly to quickly see updates added by the other. But upgrades have since fulfilled some of the instantaneous collaborative promise (and hype) of the now-crested Google Wave.

On election night, more than a dozen NewsHour staffers worked in the same text document in real-time -- filing reports from the field and transcribing quotes from NewsHour analysts and notable guests on other networks. In a different spreadsheet, staff kept track of which races were called by other news organizations and when. We also used the embedded chat feature in Google Docs to communicate while editing and adding information.

Unlike two years ago, I could copyedit a report still being typed by my colleague, Mike Melia, several miles away at the Democrats' election HQ in Washington. We worked out ways of communicating within the document in order to speed up the process. For example, when he typed a pound sign (#), that signaled the paragraph was ready and I immediately pasted it into CoveritLive.

The instant that major races were called by one of our senior producers, reporter-producer Terence Burlij alerted our control room via headset then added a Congressional balance of power update to our liveblog.

In-House Innovations

Our graphics department and development team cranked out numerous innovations to serve the election demands of the website and and our five hours of breaking news broadcasts. As Creative Director Travis Daub put it:

Katie Kleinman and Vanessa Dennis crunched the AP data and built a truly innovative system that dynamically generates a graphic for every race on the ticket. Thanks to their efforts, we were able to call up any race with accurate data in a matter of seconds. I venture to bet that we were the only network last night with an election graphics system running in Google Chrome.

Those same graphics of more than 450 candidates and races were available in a matter of seconds for use on the web, but we opted not to use them since the vote tallies changed so quickly.

Traffic Numbers

Creating a valuable-yet-free live-stream and quickly posting concession and victory speeches onto our YouTube channel, live-blog and Facebook appears to have paid off in terms of traffic.

newshour facebook.jpg

Thanks to our partners at Ustream, who helped us stream 516 years' worth of oil spill footage earlier this year, we were able to attract a sizable audience for our special election live-stream, in large part due to them posting a giant promotion on their home page for a full day. Our election live-stream garnered more than 250,000 views, more than 141,000 of which were unique.

We also notified our 73,000 iPhone app users of our special coverage plans, and more than 8,300 used the app to view our election coverage and/or live-stream. Our app download traffic tripled on Election Day, and pushed us to the brink of 100,000 app users.

As for Facebook, we were blown away by the breaking news engagement we got. It has us reconsidering that strategy to post more breaking news content for our Facebook audience. A separate two-day effort targeting NewsHour ads on Facebook pages of specific political campaigns grew our fans about 7.3 percent in that short period.

What We Learned

So what were the major takeaways from this latest election season?

  • Earlier, Wider Promotions -- Our social media and promotions teams landed our elections coverage some great placements and media mentions this year. In 2012, we'll start our outreach to potential partners and local stations even earlier, and do more promotion on-air, online and on mobile devices and with whatever new tools or services crop up between now and then.
  • Be All Things to All Visitors -- Every person who visits our site seeks a different mixture of information. Some want the latest election returns, some want smart analysis of what's transpiring and some want to watch the NewsHour broadcast or victory and concession speeches. We'll continue to feature all of that, but we'll improve how quickly they can find the specific information they want.
  • Practice Makes Perfect -- Just when you think the staff's last pre-election live-blog rehearsal has perfected your workflow, one tiny detail proves you ever-so-wrong on the big night. The last two things I did on election night before heading home was click "end event" on CoveritLive then check the home page. Turns out, by ending the event -- instead of leaving it on hiatus as we'd done in practice runs -- transformed what had been a reverse-chronological live-blog into a chronological one. At 3 a.m., we suddenly had news from 5:45 p.m. at the top of our homepage. I got Art Director Vanessa Dennis out of bed, but neither of us could find a quick-fix solution. We disabled the live-blog home page feed and I reworked some live-blog content into a short blog post summing up the night's biggest developments that could hold until our politics team posted the Morning Line dispatch a few hour later. Lesson learned.

The tone was mostly upbeat at our election coverage post-mortem meeting. We then realized the Iowa caucuses are just 14 months away -- so election planning will be front and center once again very soon.

Dave Gustafson is the PBS NewsHour's online news and planning editor. He mostly edits copy and multimedia content for The Rundown news blog and homepage, but his jack-of-all-trades duties also involve partnerships, SEO, social media, widgets, livestreaming, freebies and event planning.

This is a summary. Visit our site for the full post ».

September 28 2010

16:39

Net2 Recommends - September's Interesting Posts From Around The Web

The NetSquared team reads and shares lots of different blog posts, articles, reports, and surveys within our team. We have a lot of fun sharing within the team and it occurred to us that we should start sharing them with you, too! Net2 Recommends is a monthly series of news and blog posts from around the web that we found interesting or inspiring, mind-bending or opinion-changing, fun or just plain weird.

read more

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl