Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

January 28 2011

16:44

Ruby screen scraping tutorials

Mark Chapman has been busy translating our Python web scraping tutorials into Ruby.

They now cover three tutorials on how to write basic screen scrapers, plus extra ones on using .ASPX pages, Excel files and CSV files.

We’ve also installed some extra Ruby modules – spreadsheet and FastCSV – to make them possible.

These Ruby scraping tutorials are made using ScraperWiki, so you can of course do them from your browser without installing anything.

Thanks Mark!


September 15 2010

13:37

Using Varnish So News Doesn't Break Your Server

Here's a look at how we use Varnish, an HTTP cache, to keep NYTimes.com scaling smoothly.

July 20 2010

22:06

How to Get Started With Github and Release a Gem Using Jeweler

Ruby logoThese are quick notes I’m sharing with the NYC Ruby Women’s group, which I organize. One of my developer friends, Peter Harkins, recommended I share them with the world at large, so here they are.

More about Ruby and the NYC Ruby Women’s group in a bit.

SOME HELPFUL LINKS
http://rubygems.org – official repository
http://ruby-toolbox.com/ – shows the most popular Ruby gems (how many people who’ve looked at it, how many have downloaded it, how many have forked it)
http://railsplugins.org/ – compatibility tracking of plugins and gems (What works with various version of Ruby and Rails 3)

SOME HELPFUL RUBY COMMANDS:
gem update –system – updates all gems on in your system
gem environment gemdir – displays the system directory for gems
gem help – basic help directory
gem env – shows the Ruby gem environment
gem list – find gems. You can include letters afterward as wildcards.
gem cleanup – deletes old gem versions
rake -T – Rake helpfile

====================================
====================================
Let’s get started…

INSTALL THESE GEMS (or check if you have them already):
(You may need/want to add “sudo” [no quotes] in front of each of these commands to install)
gem install rubygems-update
gem install thoughtbot-shoulda – Read Me at: http://github.com/thoughtbot/shoulda
gem install rspec-rails
gem install jeweler – Read Me at: http://github.com/technicalpickles/jeweler

============
Establish version control:
DOWNLOAD GIT:

http://git-scm.com/download

CONFIGURE YOUR LOCAL SYSTEM TO TALK TO GITHUB (once you’ve established an account at http://github.com)
git config –global user.name “Real Name”
git config –global user.email “youremail@foo.com
git config –global github.user username

SET UP YOUR PUBLIC KEY (See http://help.github.com/mac-key-setup/ (or your OS) for details)
Check if you have a key: cat ~/.ssh/id_rsa.pub

If you DO have a key:
$ ls
config id_rsa id_rsa.pub known_hosts
$ mkdir key_backup
$ cp id_rsa* key_backup
$ rm id_rsa*

If you DON’T have a key, then create one:
ssh-keygen -t rsa -C “youremail@foo.com

============

CREATE A GEM ON YOUR LOCAL SYSTEM USING JEWELER. (We’re calling our test gem “dabeers.”)
jeweler dabeers –rspec –rdoc –create-repo

#! If there’s a FileUtils problem (this may happen if you’re running Ruby 1.8.6), then:
#! mate /Library/Ruby/Gems/1.8/gems/jeweler-1.4.0/
#! require ‘FileUtils’ in generator.rb (if that’s the error)
#! jeweler dabeers –rspec –rdoc

VERSION YOUR GEM:
rake version:write
#! Since it’s our first rake, the version is set to 0.0.0. If you wanted something different for your initial version, write: rake version BUILD=alpha1 [or change "alpha1" a word or number without quotes]

UPDATE VERSIONS AS YOU UPDATE YOUR GEM:
rake version:bump:major
rake version:bump:minor
rake version:bump:patch

COMMIT TO GITHUB:
rake github:release

COMMIT TO GEMCUTTER:
rake gemcutter:release

Thanks to the NYC Ruby Meetup for the intro to Jeweler and Gemcutter and Peter Harkins for QA of these notes.

Related Posts:

  • No Related Posts
Share: Print Digg del.icio.us Facebook Google Bookmarks StumbleUpon Tumblr Twitter

July 13 2010

17:01

Behind the Scenes of a Live World Cup

This year's World Cup coverage put a heavy emphasis on live, in-game updates and analysis. Here's a look at some of the data and processes involved.

July 01 2010

15:58

Introducing the Districts API

Now that we're building the next version of Represent, it's a good time to unbundle some of the original application into reusable parts -- such as the new Districts API.

June 10 2010

16:26

RailsConf 2010: Dig Into the Code

We've been at RailsConf in Baltimore this week, checking out the talks and demos.

May 25 2010

15:02

Building a Better Submission Form

Our new photo uploader, called Stuffy, uses a NoSQL storage engine for speed and flexibility.

December 10 2009

13:50

DocumentCloud Releases More Code, Continues to Attract Developer Interest

A public beta of DocumentCloud, one that journalists can kick the wheels on and upload documents to, won't be ready for a few more months, but work is continuing apace in our corner of the cloud. We've released a handful of code that comprises some of the components of our big picture, and it is great to see how well received our work has been by the Ruby and JavaScript communities. Last week we hit a little milestone: more than 1,000 developers are watching DocumentCloud projects on Git Hub, which is pretty cool. The advantage for us is that many of these developers are actually trying out our software releases and helping us make them stronger.

Gregg Pollack included a great review of CloudCrowd in a recent episode of his show, Scaling Rails. CloudCrowd will still be Greek to the truly non-technical readers out there, but if you have enough of a handle on software development to wish you understood"scaling" better, his review just might help.

Our latest release, Docsplit, is a command-line utility and Ruby library for splitting documents into distinct components such as raw text (which you need for searches), page thumbnails, and document metadata (details like the document's author or the number of pages it contains). Splitting documents apart is a pretty key functionality for DocumentCloud: everything else DocumentCloud does depends on the presence of one or another of these pieces. Docsplit got a lot of attention when we released it on Monday -- and we're all looking forward to seeing what other folks do with it.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl