Date: 2007-07-25 19:11:00
the sixapart bastard child
Looks like Livejournal is finally back after the big power outage. (What went wrong, anyway? Where was the redundant power?) However, let's have a look at http://status.sixapart.com, in order from first to last ...


Jul 24, 2007
TypeKey

TypeKey is up.
Posted at 1:55 PM PDT


Okay, so TypeKey is a pretty minimal service that should be easy to fire back up. (But why does this exist anyway? Why not use OpenId, which was developed by LJ and is now part of Six Apart?)

Jul 24, 2007
TypePad Service

Six Apart services today suffered from downtime due to the power outage in San Francisco and the impact it had on our co-location facility. At this time (9:37 pm PDT) all services are back online.

Update 3:55 pm: Power has been restored to our data center, and we are working to bring TypePad back online safely as soon as possible.

Update 4:07 pm: We are working to bring TypePad blogs online. You may experience intermittent availability of blogs and the TypePad application while we bring components of the service back online. Again, thank you for your patience.

Update 4:37 pm: We are beginning to bring TypePad blogs back online. You may see some slowness in loading images for a short while.

Update 5:05 pm: We are resolving the issues in loading images, and will be bringing the TypePad application up next.

Update 5:23 pm: We are now bringing the application back online. You may experience intermittent performance while we restore the application. Readers may experience issues commenting.

Update 5:55 pm: Blogs are back online and the application is back online for posting. We continue to work on restoring comment capabilities for your readers.

Update 6:25 pm: Blogs are online and the application is online for posting. We have identified the issue with commenting, and are working hard to restore that capability.

Update 7:26 pm: Commenting has been restored to TypePad blogs. At this point the TypePad service has been fully restored, and we will be monitoring its performance closely. We are also investigating why our data center's backup power systems did not respond properly. We will post more information when we have it both here and on everything.typepad.com.


The first priority for Six Apart is getting the "premier blogging service for professionals" back up. People seem to pay real money ($300/yr? seriously?) for that service.

Jul 24, 2007
Vox

Vox is back online.

Update 7:45 pm: We are working to bring Vox back online. We're currently performing a series of diagnostic tests to ensure that your data is safe. Thank you for your patience.

Update 9:25 pm: We are continuing to work to bring Vox back online. We are nearly complete with our diagnostic tests, and are working to restore access to the service as quickly as possible. Thank you for your patience!

Update 9:32 pm: Vox is now back online. We apologize for the inconvenience and thank you for your patience! We will be monitoring the service closely over the coming hours.


The next one to come back was Vox. This is a newer Six Apart offering, that seems to offer a friendlier face to blogging. And it's a free service. Notice that the freebie has been brought back up before the rest of Six Apart's paying customers over at Livejournal.

Jul 24, 2007
LiveJournal

LiveJournal may be slow to load over the course of the next few hours. We'll continue to work to get the site up to full functionality.



As of the time of this post, it's still not completely back, according to the status page. Why is LJ at the bottom of the heap? I feel like a third-rate customer at this point. Oh wait, I am third. It's not that the power outage significantly affected my use of LJ, since I happened to be at work for the whole time it was out, but that just happens to be a function of the time zone in which I live. Surely Six Apart doesn't really think of Livejournal as the free ad-infested giveaway public blog offering. Or maybe they do, and that's why we have useless cruft like virtual gifts instead of actual useful features.

Okay, I'm done kvetching.

Update: Looks like I wasn't the only one that noticed.
[info]edm : LiveJournal outage
2007-07-25T08:22:34Z
Interesting analysis.

What the delay actually said to me was "OMG LiveJournals' architecture is so complex that it takes forever to bring it back up". LJ have their own dedicated sysadmins (they certainly did prior to acquisition, the same people still seem to be around, and I've not seen anything to the effect that they're now generic SixApart admins). Presumably the other SixApart services also have their own admins. I assume all those sysadmins started working on bringing their respective sites up soon after the power came back, but it took LJ longer to get everything sorted out. (At least in the past this has been waiting on database integrity checks to complete, which seemingly take hours. That see ms... unfortunate.)

I think it's fair to fault LJ for being slow to update the status of their progress bringing the site back up (in comparision to some of the SixApart services, I saw basically only two messages -- a "power outage, we're working on it", and a "it's up, but slow" many hours later). And it would be fair to fault LJ for relying entirely on two SixApart nameservers which are apparently hosted in the same facility (ie, no off site nameserver redundancy), which means that http://status.livejournal.com/ wasn't even resolvable until a while after the power came back. But I think it's reaching to say that other sites were brought up in preference to LJ, since I don't think people were the critical time factor -- rather disk I/O and perhaps memory/CPU were likely the critical factors.

Oh and their choice of hosting facility seems to leave something to be desired, in retrospect.

Ewen
[info]ghewgill : Re: LiveJournal outage
2007-07-25T08:31:44Z
Yeah, I understand and appreciate the complexities of starting up a big site like LJ. My "analysis" (if you want to call it that!) was based purely on external observations of timing and doesn't take into account the realities of bringing up a site with zillions of users.

The real WTFs, as they say, were: (1) the apparent failure of backup power, (2) the lack of diversity in nameservers (the first thing I did was go to http://status.livejournal.org which used to be hosted completely separately and note the different TLD), and (3) the lack of communication compared to other 6a services.

One hopes that Six Apart will receive some kind of rebate from 365 Main. :)
[info]edm : Re: LiveJournal outage
2007-07-25T08:49:18Z
Yes, I tried http://status.livejournal.org/ (with the different suffix), after getting no response from http://status.livejournal.com/ and determining it was a DNS issue. And then discovered they were on the same (down) nameservers, so it made no difference which one I used. There's also livejournal.net, which is apparently using "ns1.lamedelegation.net" :-) Oh, and I found that somewhere along the way they'd let lj.com lapse and it'd been picked up by what looks to be a squatter. So hopefully someone does something about fixing up the DNS infrastructure in the wake of this outage -- even a single host in some other data centre would allow answering DNS queries to point at a status page elsewhere.

I imagine 365 Main violated a bunch of SLAs by dropping power to racks, and will be having Discussions (tm) with a bunch of customers. I've certainly seen people in charge of running other sites hosted there rather disappointed that they lost power.

Ewen
[info]bovineone : Re: LiveJournal outage
2007-07-25T15:27:40Z
What is funny is that Red Envelope (one of the customers hosted in 365 Main's SF facility) did a press release earlier in the day about their high uptime.

http://lnk.nu/prnewswire.com/f98.pl

During the outage, I verified that Red Envelope's website was inaccessible.
[info]decibel45 : Re: LiveJournal outage
2007-07-28T21:54:51Z
It's pretty shitty that it took them what appears to be over 5 hours to get LJ back up. I've been involved with fairly substantial systems before, and there's no reason it should take that long to get things up and running after a power failure.

Oh, wait... they're on MySQL. Nevermind.
[info]mskala
2007-07-25T12:30:27Z
Greg Hewgill <greg@hewgill.com>