Tumblelog by Soup.io
Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

August 09 2012

12:19

Two reasons why every journalist should know about scraping (cross-posted)

This was originally published on Journalism.co.uk – cross-posted here for convenience.

Journalists rely on two sources of competitive advantage: being able to work faster than others, and being able to get more information than others. For both of these reasons, I  love scraping: it is both a great time-saver, and a great source of stories no one else has.

Scraping is, simply, getting a computer to capture information from online sources. They might be a collection of webpages, or even just one. They might be spreadsheets or documents which would otherwise take hours to sift through. In some cases, it might even be information on your own newspaper website (I know of at least one journalist who has resorted to this as the quickest way of getting information that the newspaper has compiled).

In May, for example, I scraped over 6,000 nomination stories from the official Olympic torch relay website. It allowed me to quickly find both local feelgood stories and rather less positive national angles. Continuing to scrape also led me to a number of stories which were being hidden, while having the dataset to hand meant I could instantly pull together the picture of a single day on which one unsuccessful nominee would have run, and I could test the promises made by organisers.

ProPublica scraped payments to doctors by pharma companies; the Ottawa Citizen ran stories based on its scrape of health inspection reports. In Tampa Bay they run an automatically updated page on mugshots. And it’s not just about the stories: last month local reporter David Elks was using Google spreadsheets to compile a table from a Word document of turbine applications for a story which, he says, “helped save the journalist probably four or five hours of manual cutting and pasting.”

The problem is that most people imagine that you need to learn a programming language to start scraping - but that’s not true. It can help - especially if the problem is complicated. But for simple scrapers, something as easy as Google Docs will work just fine.

I tried an experiment with this recently at the News:Rewired conference. With just 20 minutes to introduce a room full of journalists to the complexities of scraping, and get them producing instant results, I used some simple Google Docs functions. Incredibly, it worked: by the end The Independent’s Jack Riley was already scraping headlines (the same process is outlined in the sample chapter from Scraping for Journalists).

And Google Docs isn’t the only tool. Outwit Hub is a must-have Firefox plugin which can scrape through thousands of pages of tables, and even Google Refine can grab webpages too. Database scraping tool Needlebase was recently bought by Google, too, while Datatracker is set to launch in an attempt to grab its former users. Here are some more.

What’s great about these simple techniques, however, is that they can also introduce you to concepts which come into play with faster and more powerfulscraping tools like Scraperwiki. Once you’ve become comfortable with Google spreadsheet functions (if you’ve ever used =SUM in a spreadsheet, you’ve used a function) then you can start to understand how functions work in a programming language like Python. Once you’ve identified the structure of some data on a page so that Outwit Hub could scrape it, you can start to understand how to do the same in Scraperwiki. Once you’ve adapted someone else’s Google Docs spreadsheet formula, then you can adapt someone else’s scraper.

I’m saying all this because I wrote a book about it. But, honestly, I wrote a book about this so that I could say it: if you’ve ever struggled with scraping or programming, and given up on it because you didn’t get results quickly enough, try again. Scraping is faster than FOI, can provide more detailed and structured results than a PR request – and allows you to grab data that organisations would rather you didn’t have. If information is a journalist’s lifeblood, then scraping is becoming an increasingly key tool to get the answers that a journalist needs, not just the story that someone else wants to tell.

12:19

Two reasons why every journalist should know about scraping (cross-posted)

This was originally published on Journalism.co.uk – cross-posted here for convenience.

Journalists rely on two sources of competitive advantage: being able to work faster than others, and being able to get more information than others. For both of these reasons, I  love scraping: it is both a great time-saver, and a great source of stories no one else has.

Scraping is, simply, getting a computer to capture information from online sources. They might be a collection of webpages, or even just one. They might be spreadsheets or documents which would otherwise take hours to sift through. In some cases, it might even be information on your own newspaper website (I know of at least one journalist who has resorted to this as the quickest way of getting information that the newspaper has compiled).

In May, for example, I scraped over 6,000 nomination stories from the official Olympic torch relay website. It allowed me to quickly find both local feelgood stories and rather less positive national angles. Continuing to scrape also led me to a number of stories which were being hidden, while having the dataset to hand meant I could instantly pull together the picture of a single day on which one unsuccessful nominee would have run, and I could test the promises made by organisers.

ProPublica scraped payments to doctors by pharma companies; the Ottawa Citizen ran stories based on its scrape of health inspection reports. In Tampa Bay they run an automatically updated page on mugshots. And it’s not just about the stories: last month local reporter David Elks was using Google spreadsheets to compile a table from a Word document of turbine applications for a story which, he says, “helped save the journalist probably four or five hours of manual cutting and pasting.”

The problem is that most people imagine that you need to learn a programming language to start scraping - but that’s not true. It can help - especially if the problem is complicated. But for simple scrapers, something as easy as Google Docs will work just fine.

I tried an experiment with this recently at the News:Rewired conference. With just 20 minutes to introduce a room full of journalists to the complexities of scraping, and get them producing instant results, I used some simple Google Docs functions. Incredibly, it worked: by the end The Independent’s Jack Riley was already scraping headlines (the same process is outlined in the sample chapter from Scraping for Journalists).

And Google Docs isn’t the only tool. Outwit Hub is a must-have Firefox plugin which can scrape through thousands of pages of tables, and even Google Refine can grab webpages too. Database scraping tool Needlebase was recently bought by Google, too, while Datatracker is set to launch in an attempt to grab its former users. Here are some more.

What’s great about these simple techniques, however, is that they can also introduce you to concepts which come into play with faster and more powerfulscraping tools like Scraperwiki. Once you’ve become comfortable with Google spreadsheet functions (if you’ve ever used =SUM in a spreadsheet, you’ve used a function) then you can start to understand how functions work in a programming language like Python. Once you’ve identified the structure of some data on a page so that Outwit Hub could scrape it, you can start to understand how to do the same in Scraperwiki. Once you’ve adapted someone else’s Google Docs spreadsheet formula, then you can adapt someone else’s scraper.

I’m saying all this because I wrote a book about it. But, honestly, I wrote a book about this so that I could say it: if you’ve ever struggled with scraping or programming, and given up on it because you didn’t get results quickly enough, try again. Scraping is faster than FOI, can provide more detailed and structured results than a PR request – and allows you to grab data that organisations would rather you didn’t have. If information is a journalist’s lifeblood, then scraping is becoming an increasingly key tool to get the answers that a journalist needs, not just the story that someone else wants to tell.

August 02 2012

14:41

Hate transcribing audio? Crowdsource it instead

For journalists who don't mind getting their hands dirty, Amazon's Mechanical Turk service can be a cost-effective way to avoid one of the least thrilling parts of the reporting process: transcribing. Read More »

April 20 2012

06:26

Programming and journalism students: A conversation

I think it’s pretty cool to use Storify to sort out the threads of a bunch of simultaneous conversations on Twitter:

[View the story "Programming and journalism students: A conversation" on Storify]

Please join in — on Twitter, on Facebook, or here.

06:26

Programming and journalism students: A conversation

I think it’s pretty cool to use Storify to sort out the threads of a bunch of simultaneous conversations on Twitter:

[View the story "Programming and journalism students: A conversation" on Storify]

Please join in — on Twitter, on Facebook, or here.

January 20 2012

09:27

How to stop missing the good weekends

The BBC's Michael Fish presenting the weather in the 80s, with a ScraperWiki tractor superimposed over LiverpoolFar too often I get so stuck into the work week that I forget to monitor the weather for the weekend when I should be going off to play on my dive kayaks — an activity which is somewhat weather dependent.

Luckily, help is at hand in the form of the ScraperWiki email alert system.

As you may have noticed, when you do any work on ScraperWiki, you start to receive daily emails that go:

Dear Julian_Todd,

Welcome to your personal ScraperWiki email update.

Of the 320 scrapers you own, and 157 scrapers you have edited, we
have the following news since 2011-12-01T14:51:34:

Histparl MP list - https://scraperwiki.com/scrapers/histparl_mp_list :
  * ran 1 times producing 0 records from 2 pages
  * with 1 exceptions, (XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<!DOCTYP')

...Lots more of the same

This concludes your ScraperWiki email update till next time.

Please follow this link to change how often you get these emails,
or to unsubscribe: https://scraperwiki.com/profiles/edit/#alerts

The idea behind this is to attract your attention to matters you may be interested in — such as fixing those poor dear scrapers you have worked on in the past and are now neglecting.

As with all good features, this was implemented as a quick hack.

I thought: why design a whole email alert system, with special options for daily and weekly emails, when we already have a scraper scheduling system which can do just that?

With the addition of a single flag to designate a scraper as an emailer (plus a further 20 lines of code), a new fully fledged extensible feature was born.

Of course, this is not counting the code that is in the Wiki part of ScraperWiki.

The default code in your emailer looks roughly like so:

import scraperwiki
emaillibrary = scraperwiki.utils.swimport("general-emails-on-scrapers")
subjectline, headerlines, bodylines, footerlines = emaillibrary.EmailMessageParts("onlyexceptions")
if bodylines:
    print "
".join([subjectline] + headerlines + bodylines + footerlines)

As you can see, it imports the 138 lines of Python from general-emails-on-scrapers, which I am not here to talk about right now.

Using ScraperWiki emails to watch the weather

Instead, what I want to explain is how I inserted my Good Weather Weekend Watcher by polling the weather forecast for Holyhead.

My extra code goes like this:

weatherlines = [ ]
if datetime.date.today().weekday() == 2:  # Wednesday
    url = "http://www.metoffice.gov.uk/weather/uk/wl/holyhead_forecast_weather.html"
    html = urllib.urlopen(url).read()
    root = lxml.html.fromstring(html)
    rows = root.cssselect("div.tableWrapper table tr")
    for row in rows:
        #print lxml.html.tostring(row)
        metweatherline = row.text_content().strip()
        if metweatherline[:3] == "Sat":
            subjectline += " With added weather"
            weatherlines.append("*** Weather warning for the weekend:")
            weatherlines.append("   " + metweatherline)
            weatherlines.append("")

What this does is check if today is Wednesday (day of the week #2 in Python land), then it parses through the Met Office Weather Report table for my chosen location, and pulls out the row for Saturday.

Finally we have to handle producing the combined email message, the one which can contain either a set of broken scraper alerts, or the weather forecast, or both.

if bodylines or weatherlines:
    if not bodylines:
        headerlines, footerlines = [ ], [ ]   # kill off cruft surrounding no message
    print "
".join([subjectline] + weatherlines + headerlines + bodylines + footerlines)

The current state of the result is:

*** Weather warning for the weekend:
  Mon 5Dec
  Day

  7 °C
  W
  33 mph
  47 mph
  Very Good

This was a very quick low-level implementation of the idea with no formatting and no filtering yet.

Email alerts can quickly become sophisticated and complex. Maybe I should only send a message out if the wind is below a certain speed. Should I monitor previous days’ weather to predict whether the sea will be calm? Or I could check the wave heights on the off-shore buoys? Perhaps my calendar should be consulted for prior engagements so I don’t get frustrated by being told I am missing out on a good weekend when I had promised to go to a wedding.

The possibilities are endless and so much more interesting than if we’d implemented this email alert feature in the traditional way, rather than taking advantage of the utterly unique platform that we happened to already have in ScraperWiki.


October 23 2010

00:28

Which scripting language for a novice who wants to get into Google/Bing Mapping APIs?

I have a journo background, and am a programming novice (except for a single undergad PASCAL course in 1994). My goal is to learn how to use the Google and/or Bing Maps APIs, but first, to get a handle on a scripting language like Java or Python. My question: a) Which of these languages is most useful in the context of a Maps API.

From what I've read, Python is a great language to get started with for a novice programmer, but when I read about the Google Maps API, a knowledge of Java always seems assumed. Is Java needed for working with Google Maps API? With Bing Maps' API? Can either of these APIs adapt to different scripts? As you can see, I'm a little confused.

Thanks in advance.

September 30 2010

20:36

June 29 2010

23:01

What's the fastest way to compare two large CSVs against each other?

So here's the nut. Imagine you've got two huge CSV files, snapshots from the same database taken on different days.

They share a common identifier, and always have the same set of fields. But there are amendments, omissions and additions made between snapshots that can only be detected by comparing records against each other. What's the fastest way to loop through a fresh snapshot and compare it against the previous snapshot for changes, additions and omissions?

Below is a roughed out Python routine I've written with a fake data set. Basically, it sets the unique ID as the key to a dictionary that contains what amounts to a CSV DictReader pull from two imaginary CSV files: one with the older "existing" snapshot; and another with the newer "fresh" snapshot.

It seems to work okay in testing, but when you run it over a large data set, it goes pretty darn slow. I'm curious whether anyone here knows a way I can make it run quicker.

existing_dict = {
    'A1': {'name': 'John', 'gender': 'M', 'value': 10},
    'A2': {'name': 'Jane', 'gender': 'F', 'value': 10},
    'A3': {'name': 'Josh', 'gender': 'M', 'value': 20},
    'A4': {'name': 'John', 'gender': 'M', 'value': 20},
    'A5': {'name': 'Janet', 'gender': 'F', 'value': 15},
    'A6': {'name': 'January', 'gender': 'F', 'value': 10},
}

fresh_dict = {
    'A2': {'name': 'Jane', 'gender': 'F', 'value': 10},
    'A3': {'name': 'Josh', 'gender': 'M', 'value': 20},
    'A4': {'name': 'John', 'gender': 'M', 'value': 20},
    'A5': {'name': 'Janet', 'gender': 'F', 'value': 15},
    'A6': {'name': 'January', 'gender': 'F', 'value': 5},
    'A7': {'name': 'Jessica', 'gender': 'F', 'value': 10},
}

def compare():
    """
    Compares two data dicts against each other.
    """
    # Set some counters to report outcome
    nochanges, amendments, omissions = 0,0,0

    # Loop through the existing data...
    for id_, existing_data in existing_dict.items():
        # Try to find the corresponding record in the fresh data
        fresh_data = fresh_dict.get(id_, None)
        # If it's there...
        if fresh_data:
            # Determine if there are any changes
            if is_diff(existing_data, fresh_data):
                amendments += 1
            else:
                nochanges += 1
            del fresh_dict[id_]
        else:
            omissions += 1
    additions = len(fresh_dict.keys())
    return nochanges, amendments, omissions, additions

def is_diff(existing_row, fresh_row):
    change_list = [field for field in existing_row.keys()
        if existing_row.get(field) != fresh_row.get(field)]
    if change_list:
        return True
    return False

if __name__ == '__main__':
    print "No change:%s; Amendments:%s; Omissions:%s; Additions:%s;" % compare()

That's pretty much it. Here's what the imaginary CSV files might look like, if it helps.

existing.csv

id,name,gender,value
A1,John,M,10
A2,Jane,F,10
A3,Josh,M,20
A4,John,M,20
A5,Janet,F,15
A6,January,F,10

fresh.csv

id,name,gender,value
A2,Jane,F,10
A3,Josh,M,20
A4,John,M,20
A5,Janet,F,15
A6,January,F,5
A7,Jessica,F,10

June 07 2010

17:12

Favorite Google Analytics library for coding with?

I've played with a couple of Google Analytics APIs to try pulling data, and have found the official one pretty darn confusing. I'm using Python, but figured other journohackers might find a list of recommended Google Analytics libraries useful.

June 02 2010

20:42

Why Journalists Should Learn Computer Programming

Yes, journalists should learn how to program. No, not every journalist should learn it right now -- just those who want to stay in the industry for another ten years. More seriously, programming skills and knowledge enable us traditional journalists to tell better and more engaging stories.

Programming means going beyond learning some HTML. I mean real computer programming.

As a journalist, I'm full aware of the reasons why we don't learn programming -- and I'm guilty of using many of them. I initially thought there were good reasons not to take it up:

  • Learning to program is time-consuming. One look at the thick books full of arcane code and you remember why you became a journalist and not a mathematician or an engineer. Even if you are mathematically inclined, it's tough to find the time to learn all that stuff.
  • Your colleagues tell you you don't need it -- including the professional developers on staff. After all, it took them years of study and practice to become really good developers and web designers, just like it takes years for a journalist to become experienced and knowledgeable. (And, if you start trying to code, the pros on staff are the ones who'll have to clean up any mess you make.)
  • Learning the basics takes time, as does keeping your skills up to date. The tools change all the time. Should you still bother to learn ActionScript (Flash), or just go for HTML5? Are you sure you want to study PHP and not Python?
  • Why learn programming when there are so many free, ready-made tools online: Quizzes, polls, blogs, mind maps, forums, chat tools, etc. You can even use things like Yahoo Pipes to build data mashups without needing any code.
  • When Megan Taylor wrote for MediaShift about the programmer-journalist, she asked around for the perfect skillset. One response nearly convinced me to never think about programming ever again: "Brian Boyer, a graduate of Medill's journalism for programmers master's track and now News Applications Editor at the Chicago Tribune, responded with this list: XHTML / CSS / JavaScript / jQuery / Python / Django / xml / regex / Postgres / PostGIS / QGIS."

Those are some of the reasons why I thought I could avoid learning programming. But I was so wrong.

Why Journalists Should Program

You've heard the reasons not to start coding. Now here's a list of reasons why you should:

  • Every year, the digital universe around us becomes deeper and more complex. Companies, governments, organizations and individuals are constantly putting more data online: Text, videos, audio files, animations, statistics, news reports, chatter on social networks...Can professional communicators such as journalists really do their job without learning how the digital world works?
  • Data are going mobile and are increasingly geo-located. As a result, they tell the stories of particular neighborhoods and streets and can be used to tell stories that matter in the lives of your community members.
  • People have less time, and that makes it harder to grab their attention. It's essential to look for new narrative structures. Programming enables you to get interactive and tell non-linear stories.

Jquerylogo copy.jpg

  • You don't have to build everything from scratch. Let's take JavaScript, which is used for creating dynamic websites. Tools such as jQuery, a cross-browser JavaScript library, enable people to create interactivity with less effort. Web application frameworks such as Ruby on Rails and Django support the development of dynamic sites and applications. So it can be easier than you thought.

A Way of Looking At the World

Maybe you're not yet convinced. Even though jQuery makes your life easier, you still need a decent knowledge of JavaScript, CSS and HTML. Django won't help you if you never practiced Python. All of this takes time, and maybe you'll never find enough of it to get good at all this stuff.

Still, we must try. The good news is that it doesn't matter if you become proficient at the latest language. What is important, however, is that you're able to comprehend the underpinnings of programming and interactivity -- to be able to look at the world with a coder's point of view.

I'm still just a beginner, but I feel that this perspective provides you with an acute awareness of data. You start looking for data structures, for ways to manipulate data (in a good sense) to make them work for your community.

When covering a story, you'll think in terms of data and interactivity from the very start and see how they can become part of the narrative. You'll see data everywhere -- from the kind that floats in the air thanks to augmented reality, to the more mundane version contained in endless streams of status updates. Rather than being intimidated by the enormous amount of data, you'll see opportunities -- new ways to bring news and information to the community.

You probably won't have time to actually do a lot of the programming and data structuring yourself. But now you're equipped to have a valuable and impactful conversation with your geek colleagues. A conversation that gets better results than ever before.

So, even though it's probably a bit late for me to attend the new joint Master of Science degree program in Computer Science and Journalism at Columbia University, I can still learn How to Think Like a Computer Scientist using the the free MIT OpenCourseWare, take part in the Journalists/Coders Ning network, and find help at Help.HacksHackers.Com).

And so can you.

******

Are you a journalist who has taken up programming? A programmer with advice for journalists? Please share your experiences and insights in the comments.

Roland Legrand is in charge of Internet and new media at Mediafin, the publisher of leading Belgian business newspapers De Tijd and L'Echo. He studied applied economics and philosophy. After a brief teaching experience, he became a financial journalist working for the Belgian wire service Belga and subsequently for Mediafin. He works in Brussels, and lives in Antwerp with his wife Liesbeth.

This is a summary. Visit our site for the full post ».

May 20 2010

19:24

Help parsing a comment file into CSV

On NICAR-L, someone asked the following question. I am posting it here because I wanted the answer to be archived on the web.

folks, i've got a large text file with entries along these lines, tho sometimes the center text has more than one return:

schfish on blah blah blah (return)
text text text text text (return)
10:31 a.m. on May 20, 2010 (return)

i want to separate these things so the first line is in one field, the second in another and the third in a third field... right now i'm using TextWrangler on a Mac, but could use other stuff too, except Microsoft...

thanks for any help!

In followup, it was clarified that the number of lines between the attribution and the dateline is variable.

February 23 2010

20:41

A Gentle Introduction to Google App Engine

As part of our roll-out of version 3 of the NYT Congress API, I was tasked with coming up with a sample application that uses the API to do something mildly interesting, or at least functional. I had gotten a book on Google App Engine for my birthday and was pretty excited to see that some of the basic philosophies of Django were either incorporated directly into GAE or were easy to adapt to it. So when I started on the sample app, I picked GAE and dove in.

App Engine’s Python runtime, unsurprisingly, sticks pretty close to the language’s core tenets: it uses YAML files for configuration (hey, it’s whitespace!) and can run pretty much an entire app using just 2-3 files. A NYT colleague, Derek Gottfrid, built a sample app for our article search API comprising five files, including the README. Yes, it violates the separation of logic and design that most frameworks try to respect, but it works.

GAE provides the basic building blocks a lot of Web apps need, nearly all optional: a backend in Datastore, a URL Fetch service that is wrapped by Python’s familiar urllib and urllib2 libraries, mail and messaging services and memcache. Webapp is a basic framework for building apps not exactly like Django but not so unfamiliar, either.

The development server will be familiar to anyone who has tinkered with Django, and GAE handles static files via separate servers, which is how it should be. And since it comes with a version of Django built-in, you can bring along some handy utilities, like simplejson, with a single import statement. And as I said earlier, you don’t have to separate display logic into template files, but you can, and the syntax is nearly identical to Django templates.

The sample app takes two random members of the Senate and compares their voting and bill sponsorship records in the 111th Congress. The app’s code is like the app itself: fairly tightly-focused and without a lot of trappings. It’s just service calls to the API and a single template for display. In building it, I didn’t make use of any persistent storage, so I didn’t delve into Datastore, but it looks pretty useful. One of its helpful features is that as you develop your app, it generates indexes used to help return data in the most efficient manner.

If you’re already familiar with Django, making the small step to App Engine isn’t that big of a trip. Have Guido explain things to you first, and then try it out. You can also run a stripped-down version of Django on GAE, and I’m looking to see if there’s a project I can adapt to try it out. In the meantime, if you want to tinker with the sample app, by all means fork it and see what else you can do with the API. And let me know what you come up with!

Tags: Python

January 25 2010

01:10

Using Geocoders with GeoDjango

For a “15-minute project“, Simon Willison’s geocoders library is pretty handy if you’re doing geocoding with Python. It offers a common interface to the geocoding services provided by Google, Yahoo and other sources. When we were looking at replacing the home-grown geocoding system that Andrei Scheinkman built for Represent, Simon’s project seemed a natural choice.

It was an easy drop-in, but there was one thing about it that was just slightly off. A successful geocoding result looks like this:

(u'New York, NY, USA', (40.756053999999999, -73.986951000000005))

Notice the coordinate pair is latitude, longitude. For folks using GeoDjango alongside Simon’s library, the way you build a Point object from coordinates is to pass the longitude first, like so:

>>> from django.contrib.gis.geos import Point
>>> p = Point((5, 23)) # 2D point, passed in as a tuple

So on Friday I forked Simon’s project and reversed the ordering of the coordinates in a successful result. That way you can pass that portion of the result directly to a Point constructor:

>>> from django.contrib.gis.geos import *
>>> from geocoders.google import geocoder
>>> geocode = geocoder('GOOGLE-API-KEY')
>>> results = geocode('new york')
(u'New York, NY, USA', (-73.986951000000005, 40.756053999999999))
>>> pnt = Point(results[1])

Not a huge deal, but in keeping with the spirit of library, I think.

January 19 2010

07:59

November 25 2009

19:46

Keeping It Simple(r)

I haven’t mentioned Fumblerooski in awhile, but rest assured that work continues, especially during college football season. I’ve added more coaching information (still a long ways to go on that, though) and will be unveiling player rankings soon. But the biggest thing I’ve done lately has nothing to do with new features. Instead, as I’ve become a better coder in general, I’ve seen how bloat can really hinder a project. So I spent time last week reorganizing the Fumblerooski code to take advantage of some of Django’s strengths.

This all started back at the NICAR conference in Indianapolis where several of us led a mini-bootcamp on using Django. At one point, as we talked about how projects are organized, I showed off the models for Fumblerooski. They went on forever. Looking back, it wasn’t the message that I wanted to get across – I think several people gasped.

Fumblerooski still is far more tightly coupled together than I’d like – the scrapers can’t really be separated out as an independent app, which would be the right thing to do. But it’s getting closer. Same for the rankings app. Coaches could be the next one, or maybe players. The scrapers, even though they don’t constitute an actual app, are better organized. The point is that now the code is probably easier for someone else to follow, but it’s also easier for me to locate specific functions. I spend less time hunting and more time actually doing things.

How does this actually work? Python’s ability to import libraries into others means that Django apps can share common code (and, if you’re working in the same database, data) inside a single project just by typing an import statement:

from fumblerooski.rankings.models import RushingSummary

And I get access to the rushing rankings wherever I need to use them. Because this is so trivial, it sometimes led me to think that where I put things didn’t matter. But it does, it really does, for your sake and the sake of anyone who attempts to look at your code.

Tags: Python django
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl