Sunday night/Monday morning: I didn’t sleep very well so I didn’t get much sleep. Therefore looking forward to Monday night. Unfortently, I’m “on call” until 11pm, but I should be able to stay awake until then.
Monday 10.50pm: We start receiving issues via our online Kayako helpdesk that some people are having database issues. As I don’t want alerts beeping me at night, I start investigating.
Monday 11.15pm: There’s a lot of Linux quota checking scripts running. Most odd – the maximum I normally see is 1 – and there’s pages full here. And I can’t “kill -9” (terminate) them faster enough. Only one thing for it – server reboot.
Monday 11.30pm: Server hasn’t come back up yet. Hmm, perhaps it just stalled on the reboot. Try again.
Monday 11.45pm: Not good, it still hasn’t come back up. I’ll have to alert the datacenter for investigation.
Tuesday 12am: Data center tries a reboot with a console attached
Tuesday 12.30am: Server is undergoing a file system check (fsck)
Tuesday 1.15am: Still undergoing the fsck. This normally takes around 20 minutes, but the datacenter is reporting lots of “inode faults” being reported on the /var/ parition on the server
Tuesday 2am: Still waiting for the check to finish
Tuesday 3am: Datacenter states it is unlikely to finish, but we agree to give it an extra hour
Tuesday 4am: I ask the datacenter to stick in a new hard drive and mount the old one as a slave.
Tuesday 8.30am: I’m already in the office having had less then 3.5 hours restless sleep. I start copying over the data from the old hard drive
Tuesday 5pm: Still copying, but only an hour to go
Tuesday 6pm: Most of the data is copied and no corruption
Tuesday 7pm: Nooooo! The entire datacenter goes down for nearly 45 minutes – when I had 6 SSH windows running to the server
Tuesday 8pm: Looks like everything is working, but let’s wait for customer responses.
Tuesday 9pm: Hmm, the email system doesn’t appear to be working correctly. It appears the the cPanel system can’t decide whether to use the “old mbox” format for emails or the new maildir format. The data from the failed hard drive expects to be read in the old format.
Tuesday 10pm: Email appears to be mainly working
Tuesday 10.30pm: Nope, still got minor problems with IMAP authentication
Tuesday 11pm: All is checking out and responding.
Tuesday 11.30pm: I head home
Wednesday Midnight: I’m home!
Wednesday: 0.15am: I expect to have dinner
Wednesday 1am: I expect to be able to fall asleep
Wednesday 8.30am: Gotta be in the office again.
I’m going to be a sleep-deprived zombie by the end of today (Wednesday!)