Sunday, March 23, 2008

Out-of-process memoization using memcached.

This is my humble recognition of memcached (“a high-performance, distributed memory object caching system”) usefulness. It is not a language or platform dependent tool, so do not overlook it just because I tend to talk about Python or Java.

Last weeks I have been fixing a lot of small bugs on a legacy application. After solving most of them, a challenging problem remained: Lots of database-intensive computations were taking too much time to complete. Where "too much time" == 48 hours. Some people suggested me to rewrite all that part of the application, but that was impractical with all the pressure of getting it done in one or two days. And it was the wrong solution, as explained at the end of this post.

So after a bit of research and profiling, I identified two critical points. One was solved by carefully rewriting a lot of SQL queries into just one or two, while modifying the program as little as possible.

The other problem was the program repeatedly computing the same thing over and over. It resembled the naive recursive fibonacci algorithm:

def fib(n):
if n == 1 or n == 2: return 1
return fib(n - 1) + fib(n - 2)

Of course, this is easy to solve adding memoization to the fib routine:

fibs = {1: 1, 2: 1}
def fib(n):
if n in fibs:
return fibs[n]
else:
result = fib(n - 1) + fib(n - 2)
fibs[n] = result
return result

My problem when applying this kind of solution was memory space: I feared running out of it, considering the amount of data processed by the program. In the end, I would have to program a little cache subsystem and deal with the feared global state. Or dig into the inner deeps of the system changing a lot of method signatures to pass the damn state along.

But I didn't. I just plugged a java memcached client library, surrounded the computation routine with code very similar to the second fibonacci example, fired a memcached instance on the development server, and it went way faster.

Now everybody is happy, including myself for not having to reimplement the whole computation. Rewriting code from scratch is tempting when nobody understand the existing one anymore. But we tend to underestimate the amount of work behind that unintelligible pieces of code. Much of their ugliness comes from covering a lot of corner cases. Coming with a new, shiny piece of code may not be that hard, but it will be buggy, until it passes a great deal of testing. If we can not understand the existing code, we can hardly thoroughly test the new one, just because we are not understanding all the corner cases. Only after we really understand the ugly code, we can write a better version from scratch. That was not my case.

Just for the record, the computation now takes 3 hours, for a 15x speed improvement :).

Saturday, March 8, 2008

How to Demotivate Me at Work

This a condensed list of things I hate, collected from the past five years of work for multiple software companies. In no special order:

  • Always assign me to projects where I am the only developer.

  • Throw me at unfinished projects where the single developer left the company.

  • Throw me at projects where the massive pre-existent accidental complexity (e.g, stupid technological infrastructure) never gets out of the way.

  • Put technical decision-making on non-technical people's hands.

  • Throw me at crappy, badly “finished” projects that no one understand anymore, just after a critical “milestone” happened (such as a very very angry customer, or a last-minute disaster). Bonus points for letting me solve them without help of other programmers. Add even more extra points for no help from other human beings at all.

  • Move me from project to project, without getting anything really finished.

  • Put more attention on how do I work than on what I produce.

Disclaimer: This is not a targeted critic against my current employer (or an specific past employer). As far as I know, this kind of shit happens, almost everywhere. But I would really like to work on a different world. Can we change it?

Sunday, March 2, 2008

Reading a Binary File on Ruby

No, it is not as simple as:

contents = IO.read(path_to_binary_file)

Because it does not work well on Windows platforms. You should do:

contents = open(path_to_binary_file, "rb") {|io| io.read }

Obviously, the key is the binary flag. It can not be passed to IO.read.

I learned this because a co-worker had a mysterious problem sending email attachments with rails. I was not able to help her (that IO.read never seemed suspicious to me), but once she found the solution, all became clear.

But you know, the real bug was she using Windows as a development platform, right?.

Update: Changed recipe to use open instead of IO.open. Thanks to Brian Hartin for pointing that in the comments.