Last week, I was quietly working on my web server when all of a sudden the whole thing ground to a near-halt. It wasn't completely dead, because it would still ping and every few
minutes I would receive another packet of characters (I was literally in the middle of refreshing a screen I was looking at). Not knowing what was happening and not having any way to find out, I went and did something else for a while.
45 minutes later, my server returned to normal operation as if nothing had happened. This did not appear to be just a network congestion problem, it was definitely something my server was busy doing. A bit of investigation showed that the culprit was in fact
lnk.nu. Hundreds of machines all across Canada had all accessed the same short link at the same time, and completely pegged the PostgreSQL database processes, and also run my server out of memory. It's quite a testament to both FreeBSD and PostgreSQL that they survived at all.
What I believe happened was that somebody had sent an email containing a lnk.nu link to a mailing list to which lots of people from Canada were subscribed. (The link in question happened to be a job opening at
who.int.) Looking up the reverse PTR records for the machines that loaded the URL, there are names like "mail", "barracuda", "filtre", "antispam", "mx1", "incoming-smtp", "guardian", etc. It seems that they all accessed the link for purposes of virus checking, all at pretty much exactly the same time. This was not good for my poor server.
I decided that it might be time to move lnk.nu to a different server. It's written in Python, so it's an ideal candidate for
Google App Engine, and I've been looking for an excuse to play with GAE. So I downloaded the SDK, converted the code over to GAE (using Google's datastore instead of SQL), and made it work locally. This part was refreshingly easy and worked well.
The next step is to set up the Google site so it responds to http://lnk.nu and handles the requests appropriately. Given that I've already got the code working locally, that should be straightforward. However, there is one gigantic caveat when using Google web site services (that I've actually already run into for another project):
You cannot have Google's servers respond to a "naked" domain name that doesn't have a hostname. This means that having Google respond to http://lnk.nu is not possible.
(There is in fact a good technical reason for the above restriction. When you set up a site with Google hosting, you add a CNAME record to the DNS for your hostname, ie. "www.example.com. CNAME ghs.google.com.". This lets Google completely manage the association between "ghs.google.com" and any particular IP address(es), which is critical for their load balancing setup. The caveat is that a record with a CNAME must not have any other DNS records associated with it, including an SOA record. The SOA record is required on a "naked" domain name like lnk.nu, so you can't add a CNAME there.)
To work around this, I'll have to set up a hostname that Google can respond to, something like http://a.lnk.nu. Of course, that's a pretty lame name for a link shortener to use, so I'll still want the published link to be http://lnk.nu/blahblah. This means that I'll have to have some other, non-Google server respond to a http://lnk.nu/blahblah request with a redirect to http://a.lnk.nu/blahblah. This adds another level of indirection to the resolution process for a shortened link, which adds another browser round-trip, which might slow the whole experience down no matter how fast GAE hosting ends up being.
It turns out that Namecheap (my registrar for lnk.nu) offers "URL redirection" where their server will respond to a particular hostname and redirect the browser somewhere else. It can also be configured to retain URL path information, so http://lnk.nu/blahblah would redirect to http://a.lnk.nu/blahblah. This would completely take my own server out of the loop, hopefully avoiding any more problems like those last week.
2009-11-07T18:14:22Z