It appears that our poor little undead hard drive is experiencing an odyssey of discovery and exploration. Todd originally chose a recovery service due to their claimed expertise with our drive's particular problem and because they were only in San Jose. In reality, it turns out that their actual lab is in Vancouver, B.C., and that they have seemingly forgotten what they once knew about recovering from this particular problem. Their price estimates kept increasing and their percentage of likely success dropping, to the point that they wanted $1000 (in actual American dollars) for a 55% percent chance they could salvage data. Yeah, right. Thanks, guys.
Todd reclaimed the drive from them (after losing "only" $75 in shipping fees) and started shopping around. He found a new service in San Leandro that's probably going to want around $800, but sounds confident they can deal with our problem. As I've mentioned before, there's a known failure in that make of drive, so it stands to reason that recovery services have seen this problem before. I'm guessing that our bizarro filesystems on that drive (you mean people use something other than VFAT?) probably doesn't make life any easier for the techs. In any case, with a little luck and some skill on our part, we should have our data back in a couple weeks.
Apparently sometimes it takes Debian a few times to get things right. With any luck, this will be the last time any of you think about mime-support for quite some time:
"I am awfully and sincerely sorry. Apparently, I wasn't able to assign enough time to this issue and produced insufficiently tested updates. I'll do my best not to repeat this."
The DocRoot for www.weebairns.com has been recreated along with a new weebairns group. Group membership includes users 'richard' and 'amy' who own this domain. Once their content is in place we'll turn it on.
Jessica Grace Wing has her site back up again, under a veritable plethora of domain names (which feels pretty old-school, I have to say). Apparently she's staging a Goth opera now, which is a lot more ambitious than anything I've done recently. You should take a look.
Oops. Looks like it took the Debian team two tries to get all the kinks worked out.
Though the domain owner moved hosting to another system awhile ago (oh, a year or two) it seems the name baldikoski.com was still hovering about in our DNS. Mostly annoying to me since I couldn't send mail to the domain owner without it bouncing. So I removed it.
Michael Lemme's periodic.net is back alive and kicking.
John Velcamp has started restoring his two domains. This brings us up to 36 domains, which is a far cry from the 100+ we had in the old days, but it's starting to feel like home again.
A Debian developer found some exploitable race conditions in the mime-support package; Debian has released a patch to address the problem, which we have now installed. Spore users should notice absolutely no change in how the machine behaves.
The dead hard drive is still up in Vancouver. Todd chose a company located in San Jose to do the recovery, because they have experience with Fujitsu drives that died in the particular manner in which ours did, which was a sensible thing to do. However, it turns out that the actual lab that does the work is up in Vancouver B.C. They shipped it up there, where it got tangled up in Customs for a few days. Apparently there's a fee that Todd could have paid to expedite its progress through Customs, but he only found out about it after the fact. The lab up there didn't actually receive shipment of the drive until Thursday, and they're still working on things. To expedite things, Todd has paid a little extra to get the drive shipped back express, so the drive shouldn't get snarled in Customs again. I can't give a definite ETA at this point, but I'd say within a week we should have the drive back and ready.
As has become painfully clear over the last few years, there are a lot of problems with public-key infrastructure. I used to work for a company whose primary product had to interact with the pointy end of PKI a number of times, and the hubris of people selling PKI products always astonished me. Entrust wanted to issue a certificate to every single citizen of Canada, and use those certificates for secure online elections. I'm not sure I even know how to explain to you how difficult a problem this is, but that didn't stop the folks at Entrust from trying. They failed, of course.
You guys don't care about any of that stuff, I'm guessing. You just want to be able to check your mail without getting hassled every time you check a message in Outlook Express about how our certificates are full of worms and are going to annihilate your computer and those you love in a flash of incandescent whiteness. Well, we don't have the mad bank to get a real certificate, but we've done the next best thing, and created a Spore Certificate Authority. Go ahead and install the CA certificate in your browser, or download it and import it into your mail client, should it support something that fancy (works: Mozilla, Microsoft Internet Explorer (with Outlook Express); doesn't work: Safari, Camino; doesn't care: Mail.app — feel free to update this list in comments). To test if it works, check out the SSL-protected version of our homepage. If your browser doesn't whine at you about the certificate being invalid, you've probably imported it all right. If you really want to import the certificate but don't have a clue how, contact us. If you're some kind of paranoiac cypherpunk and want to verify that there is some entity with a phone and an e-mail address who created that CA certificate, contact me, and we can pretend that we trust each other after reading a random string of letters and numbers at each other over the phone. (No, I'm not being completely serious, and no, I don't expect most of you to know or care what I'm talking about. Please move along.)
Several exploitable flaws have been found in OpenSSL; Debian has released patched OpenSSL packages to address the problem. We've upgraded the affected packages and restarted all the servers using SSL. Fortunately for us, this didn't necessitate recompiling any packages; this update makes the OpenSSL library non-threadsafe.
Sean Mullin, brother of Spore sysadmin Rocky, now has the web site for his punk band back up. Punk seems a lot more relevant these days, don't you think?
If you used to use mail.spore.org to check your Spore e-mail via the web, you can now do so again. If you weren't doing it before but think it would be cool to do so now, then knock yourself out. If you have folders set up in your IMAP client, they should be visible as folders in Squirrelmail, and vice versa. If somebody wanted to donate the money necessary to buy an authenticated SSL certificate for mail.spore.org, you wouldn't even get any warnings from your web browser when you go to connect to it, which would be cool.
Although they haven't yet resuscitated all their content, Spore is once again offering web service for these domains.
As part of our ongoing effort to encourage ourselves to provide you with a reliable, robust hosting environment, we've installed a little service called uptimed. It tracks the system's longest stretches of continuous operation, and sends the sysadmins little congratulatory messages when certain milestones have been passed. You can see what the record uptimes are and where the current stretch ranks by typing "uprecords" at the command line; my hope is that there will be only one line in the output for a long time. Other systems I maintain have hit 430 days of continuous uptime without sweating too much, and it's my sincere hope that the new clone.spore.org will make this possible for us as well.
I talked to Todd today, and he said that the drive had been shipped from San Jose to the company's actual recovery lab, up in Vancouver. So it'll be a few more days yet before we get the recovered drive back (if, in fact, it's recoverable).
Thanks to Spore users Mark Chang and Mike Brodesky, and thanks to a very helpful article on the internet, clone now has an online, snapshot-based backup system in place. By "online" I mean that these snapshots are always available, so if you accidentally blow away a file and want it back, either grab it yourself (from /backups), or ask us to do it for you.
The way the system works is simple:
Due to the way that the system is written and some properties of Unix filesystems (in particular the ability to have two directory entries pointing to the same file, a property known as "hard linking"), keeping around all these snapshots doesn't actually occupy too much space. When the archive is fully populated, it will probably be about two to two and a half times larger than the directories that are being backed up.
This is only a piece of our backup system; we're still working on getting a DAT jukebox that will allow us to make complete, offline backups that can be used to restore the whole system from a crash. We're also looking at getting a USB 2.0 card that will make running the snapshots much faster; right now the drive is pokey and is the limiting factor on how long backups take to execute. At the very least, though, this should make it harder to lose your web sites and mail, and if the system should crash again, we'll have a leg up on recovering it.
ns.spore.org (which is also neé clone.spore.org) serves as the master DNS server for quite a few domains. Until today, there was no way for secondary DNS servers to perform the zone transfers necessary to keep their local information current, due to an oversight on the part of the administrators. This has now been fixed, and if you run one of these secondary DNS servers, you should stop seeing angry messages in your logfiles soon.
Thanks to incredibly swell Spore old-timer Mike Brodesky, the project now has a 120GiB Western Digital USB 2.0 hard drive attached to it. I've formatted it (as ext3 this time; no monkeying around with exotic filesystems) and mounted it on /backups, as well as performing a one-time backup of /www and /home. Spore user Mark Chang has provided us with a number of useful suggestions as to how to set up backups, and I'll probably be using some variant of them to institute some kind of ongoing backup system. Also, we may have a DAT jukebox hooked up to the machine soon, which would add another layer of (offline, this time) backup storage. Obviously, we learn from our mistakes. Thanks for the help, Mike!
By popular demand, the pop-before-smtp service has been installed on mail.spore.org (neé clone.spore.org). If you want to use mail.spore.org as your outgoing mail server, all you have to do is check your mail first (via POP or IMAP, with or without SSL enabled) and then wait a couple seconds, and you should be able to send mail. Yes, for once, it really is that simple. Please let us know if it doesn't work for you, and we'll see what we can do.
NOTE: I personally would still prefer people who plan on sending a large volume of mail via mail.spore.org to contact us and set up SASL authentication — it's slightly more secure and much less random. It's also more likely to not behave quirkily, once you go through the pain of setting it up.
We have upgraded libc to patch a potential root exploit (not much risk to us, because we don't use the Sun RPC code affected by the vulnerability). Performing this upgrade showed that the move from RedHat to Debian has already justified itself, after taking about 15 minutes and no real work to perform the upgrade.
We've installed and configured ntp, the Network Time Protocol daemon, so clone.spore.org now really knows what time it is. We're a stratum 3 server, which means that if you also run an NTP server, you can either be a client or a peer of ours; contact us if you're interested.
Here's a quote from Todd Courtois's recent e-mail detailing the progress of the recovery process:
"Yesterday I purchased a new 120GB drive and shipped it along with the dead drive to 1stDataRecovery in SJ.
"They've worked on this exact same problem ('BIOS can't detect Fujitsu MPG series drive') before and are pretty confident they can recover the data. The quote they gave was $550, which is in line with other quotes I received from Lazarus and a couple other places who couldn't give me a straight answer about whether they'd recovered data from this particular model of drive before. Since the problem is so common and so specific, I opted to go with someone who said they had experience with it."
Elicia David, who's been coordinating fundraising for the recovery, replied with:
"I have a list of people and their pledges, it adds up to around $500, so when the time comes I will hold them to it."
You people know who you are. (Seriously, thank you very much for offering to contribute!)
Finally, Todd replied with one further message:
"The actual work should only take a day. If the drives arrive there today, we may get the data back by Friday or early next week, after you account for shipping delay and so forth."
The Shotokan Karate Institute has their web site back up.
Apparently, a large set of Fujitsu drives, to which our poor dead drive belongs, have a known manufacturing fault that causes them to start behaving erratically after a certain period of time. Hardware recovery it is — and as I've said before, if you're willing to pitch in for the recovery, please contact Todd Courtois or Elicia David.
Miko Matsumura has his personal site (or the beginnings of it) back up again.
Bill Nyden's two domains, nyden.us and The California Sport Diver's Page are now available for perusal, thus continuing our nautical / aquatic theme here at the Spore project.
Today, Todd (owner of rawthought.com) and I dropped by sharon.net to pick up the drive that died. It's a 40GiB Fujitsu drive, less than two years old, and it's been kept in a reasonably cool, shock-free working environment its entire lifespan. There's basically no reason for the little wimp to have given up the ghost yet, except sheer orneriness.
Todd is now doing his best to resurrect the drive. Before we go through the potentially costly task of sending the drive to Drive Savers, Todd's going to take a crack at bringing it up himself. Initial indications are cautiously optimistic, and even if Todd isn't able to recover the drive himself, it looks highly likely that getting Drive Savers to recover it will cost less rather than more (the range is between $500 and $2700). If you lost data, want it back, and are willing to contribute to the recovery pool, contact either Todd Courtois or Elicia David, the two Spore users who have graciously volunteered to head up the recovery efforts.
More updates as news warrants. It's looking more and more likely, though, that those of you who lost web sites or your inboxes will be able to get them back.
The Oacious / Illuminaughty Burning Man community has their web site back up and rockin'.
Michelle Valckenier-de Greeve has her portfolio website back up!
About Face appears to have restored all of their content and have gotten themselves back in business. If you haven't spent any time looking over this thoughtful, well-designed site exploring the issue of advertising portrayals of women and how the images presented affect women's perceptions of themselves, I recommend you do so. One of the pleasures of being an administrator of the Spore project is that it allows me to support resources like this one.
The web site for the Golden Gate Tall Ships Society has returned, thus helping restore the Spore position as a pre-eminent free resource for things nautical. Can pondyachts.com be far behind?
The web site for Mackenzie River Partners is now back up.
One of the most obvious things we've learned from the last week is that more than anything else, spore.org needs a sensible backup strategy. Discussion on what form this is going to take is happening right now, but we're open to outside suggestions, especially if they come with offers of help or hardware. Basically, we need to provide a solution that is fast, easy to maintain, and will provide us with a good hedge against catastrophic loss. It would be nice if we could recover lost files for individual users, but the focus will be on making it so that the system can survive another severe crash without data loss for the community as a whole.
I'm going to get up on my soapbox for a moment here and say something that should be obvious, but apparently isn't. I spent part of today responding to an e-mail from a very cranky person who took issue with my attitude in the transition FAQ. In particular, he thought our attitude towards system backups was "grossly negligent". I will agree that we could have done a better job of making it so that we didn't lose people's web sites and inboxes and the backups all at once. If we'd had more available time and hardware resources, this almost certainly would have been handled a different way. I'm sorry that the way we did (or didn't do) things cost a lot of you time and important data.
At the same time, it is never a good idea to trust other people to back up your critical data. We try very hard to make spore.org a professional, secure and reliable hosting environment, but the project is basically a shoestring volunteer organization, and as such we can make no guarantees about data integrity. We will do our best to keep random internet perverts from molesting your stuff, and we'll try to keep the whole thing from going up in a huge ball of flame, but Stuff Happens. Even when we get our fancy new backup system operational (and I guarantee you that there will be regular, scheduled backups happening before the end of the month), I implore you, please keep local copies of stuff on the system. It will make you, and us, feel a lot better.
Another four domains have been restored to the system, bringing the total to 26. That's still less than 15% of the domains we had before, but one of the things we're going to learn from this process is how many of those domains were actually still being used. This is far from the way I would have liked to discover this information, but I'm trying to look on the bright side of things.
For anyone who's curious, some changes have been made to how web servers are configured. It used to be that we turned on all the bells and whistles by default. Now, instea, for many things (like custom error pages) we enable those per virtual host. So if you used to use virtual includes, CGI scripts, and the like on your site, let us know, and we'll re-enable those features for you in the Apache configuration.
See also this thread on our delightful feature request forum.
Mark Chang has his (terse, minimal) personal site back up and ready for your perusal.
It's a little known fact that there is a list of all of the users of the system available for distributing important messages to all users of the system, all-users at spore (abuse of this address is very strongly frowned upon). It's generated by a script available to system administrators called create-all-users-list, which was run at regular intervals on fungus.spore.org (the name for the machine that was, until its hard drive crashed, clone.spore.org). This list is now being generated at 3:23 every morning.
As part of keeping an eye on how things have been holding together since the transition, I've put a fair amount of work into weeding the cruft out of the messages sent by logcheck so that they'll be more useful. For those who don't know, logcheck grovels over the system logs once an hour and sends a digest of items appearing in the log that might be interesting to system administrators. Because Jared had some concerns about the signal/noise ratio, a new alias (called logcheck in /etc/aliases) was created to handle distributing the mails. These days, the logcheck mails typically weigh in at about 1-5k (down from a high of half a meg(!) per message when I first started this process) and are signal-rich enough to not set off any spam alarms. If you're an admin and are curious, go ahead and add yourself to the logcheck alias. In particular, if you see users you know on the list who have passwords that are going to expire soon, you might drop them a note.
To complement our custom error page for documents that cannot be found, Whileseated owner Michael Murphy has created a simple custom error page for his own domain. If you're a domain owner on Spore, you can do the same thing yourself! Just create a file within your domain's web root and send the location to us. We'll make the necessary changes to the web configuration!
Spore system administrator Rocky Mullin has restored web service for one of his personal domains, the idiosyncratically named caliban.sf.ca.us. There's no content there at present, but at least the site's back on the air.
test entry, more or less. but i did some work to add a new domain (great timing, eh? better than last monday, though) and sent a message for forrest to turn it on.
one thing i'm really missing is the sysadmin docs that were on the web site.
phpBB is about the best free bulletin board system out there (there aren't many, but that shouldn't be considered a slight against phpBB — it's very well-designed and easy to use). I downloaded a source package that a nice young man had made for Debian, which is located in /usr/local/src on clone.spore.org. If you've never seen what a Debian source package looks like, there's the place to look. I had to make a couple minor tweaks to get a package that would install on our Debian woody system, and then spent a little time configuring the system.
If you get a chance, tell a few other Spore users about it. It won't be useful unless a critical mass of users know about it, but if they do, it could prove to be very useful as a way of educating the user community. Also, obviously, we now have the ability to provide forums for hosted sites that may want them. If any of you have been wanting to set those up, now might be the time.
OK, I'm all done now; I have about a billion other things to do that have been piling up.
In keeping with my recent Doré theme, I have created a new custom error page for missing resources. I did this for two reasons:
The file itself is located at /var/www/reaped.html on clone.
Because I'm hoping that the entries typically will be brief, I went ahead and modified the RSS feed so that it would include the entire body of the entry. Oh, and yeah, that means that we now have XML feeds so that those of you with headline readers (such as the excellent NewNewsWire for Macintosh, which I use) can subscribe to this feed. There is an RSS 1.0 feed and an RSS 0.91 feed. Use them with abandon.
I updated the homepage to point to this page. Obviously, we eventually need to reskin this page so that it matches the rest of the site. Does anyone have any particular Gustave Doré images that they like? My source for the existing images on the site has been Therion's Archive, and I'm trying to stick to images from the folio of engravings that illustrate Dante Alighieri's Divine Comedy.
As you can see, Movable Type has been installed. This will allow Spore users to see what the administrators have been up to (it will also help the administrators keep an eye on each other). Anytime an administrator makes a change to the state of a Spore machine, a note should be made of it here.