Date: 2008-11-15 13:28:00
wikimedia dumps
The Wikimedia download server has been offline for a few months recently because of disk space issues. It's been back up and running for a few weeks now, and just recently finished dumping the English Wiktionary database. I have downloaded the database and refreshed my Wiktionary to dict gateway to 2008-11-12 (the previous update was 2008-06-13).

The English Wikipedia dump is currently in progress and today has an estimated completion date in June 2009. I expect it to be done sooner than that, because the first articles that are dumped are the oldest ones and those usually have more revisions. This seems to cause the time estimation algorithm to overestimate the amount of time remaining (it was estimating July 2009 last week).
[info]decibel45
2008-11-15T17:02:21Z
What's sad is that if they'd use Postgres, the last time I looked at their usage stats they should be able to run all updates off one server and load-balance reads to read-only slaves. That would mean they could just use pg_dump and be done with it.

Shit, we have a 800GB database and could certainly dump it in less than a week.
[info]ghewgill
2008-11-15T19:16:53Z
I think most of the time is actually taken by running compression - they either bzip2 or 7zip all output files. But it's hard to tell because everything is conflated into one run.

Not that I'm saying using Postgres wouldn't be a good idea. :)
Greg Hewgill <greg@hewgill.com>