Welcome, Guest. Please login or register.

Login with username, password and session length

Author Topic: Website issues - 10th July 2014  (Read 6276 times)

0 Members and 1 Guest are viewing this topic.

[Al]

  • Server Monkey
  • Site Admin
  • God Like!
  • *****
  • Posts: 12304
  • Tweaking required.
    • Southerndownhill.com
Website issues - 10th July 2014
« on: Jul 11, 2014, 16:35 »
You may have noticed, we have had a few issues...

At about 1am on Thursday, July 10th, the site went offline. No remote access either. Console won't allow login, just a lot of I/O errors. Turns out both our hard disks had failed (its a hardware RAID1). The ISP pulled both drives, put two new ones in and installed a blank operating system and wished us luck with out backups...

Thankfully we do take backups, every night. Unfortunately they were imperfect. We had a database issue during a Wordpress update back in November, which meant we had to restore from a backup. We restored into a temporary database, as you do, except we never moved back. So, the nightly backups have been backing up a stale, corrupt copy of the front page database every night since then. Ooops.

We have all the filesystem stuff, we have the forum DB and we have the front page database from a manual backup from a few months ago. Thats all now been restored, and as you can the the forum is working normally (minus a days worth of posts). The front page will be back once we've put a bunch of articles back in manually from drafts/caches. We're also looking at sending the failed drives to a recovery company in order to try and restore all the articles, but that will take a while.

So, front page back up once we have it looking OK. Forum should be fine. Spot any issues let us know.

Cheers,
Al
Ride fast, take chances. Just don't blame me when you fall off.

scar4me

  • Immortal
  • God Like!
  • ******
  • Posts: 2319
Re: Website issues - 10th July 2014
« Reply #1 on: Jul 11, 2014, 17:47 »
I feel your pain.
Had a multi drive failure in a live 5TB RAID5 set the other week at work, and it didn't have any hot spare! :'(
Nasty scenario's, but managed to force a failed drive online long enough for 1 replacement to rebuild then swap the second failure!
You only really realize your backup problems/mistakes when it's critical and you need them!

Glad to see it's all back up now :)

[Al]

  • Server Monkey
  • Site Admin
  • God Like!
  • *****
  • Posts: 12304
  • Tweaking required.
    • Southerndownhill.com
Re: Website issues - 10th July 2014
« Reply #2 on: Jul 11, 2014, 18:27 »
If only we could have forced the drive back online... One drive rattles, the other lists no partition table. How long do you reckon the RAID alarm has been being ignored by the ISP for? *sigh*
Ride fast, take chances. Just don't blame me when you fall off.

minty

  • Immortal
  • God Like!
  • ******
  • Posts: 1441
  • Don't be a...
Re: Website issues - 10th July 2014
« Reply #3 on: Jul 11, 2014, 21:38 »
I feel a bit of a turkey after reading this gobbledygook - gobble gobble ;d

well done for rectifying it :)
''hey - I'm curious.... the girl in your avaatar - who is it?  recognise her but can't put my finger in it...''

xiphon

  • Immortal
  • God Like!
  • ******
  • Posts: 2754
  • A southerner oop north!
Re: Website issues - 10th July 2014
« Reply #4 on: Dec 22, 2014, 14:58 »
Move to the cloud!
I love citrus fruit!

[Al]

  • Server Monkey
  • Site Admin
  • God Like!
  • *****
  • Posts: 12304
  • Tweaking required.
    • Southerndownhill.com
Re: Website issues - 10th July 2014
« Reply #5 on: Dec 22, 2014, 15:20 »
Why? It's no more reliable and costs more. 
Ride fast, take chances. Just don't blame me when you fall off.

xiphon

  • Immortal
  • God Like!
  • ******
  • Posts: 2754
  • A southerner oop north!
Re: Website issues - 10th July 2014
« Reply #6 on: Dec 22, 2014, 15:21 »
No more reliable?  Seriously?
I love citrus fruit!

[Al]

  • Server Monkey
  • Site Admin
  • God Like!
  • *****
  • Posts: 12304
  • Tweaking required.
    • Southerndownhill.com
Re: Website issues - 10th July 2014
« Reply #7 on: Dec 23, 2014, 00:18 »
Just today Rackspace cloud failed... Unless you are using portable VMs, which only Joyent do iirc, it's still a single hardware setup. Sure, if we used more than one server we could increase redundancy but that's the same for dedicated or cloud. Unless you are talking shared environments, which such for large DBs, which is why we don't use them. 
Ride fast, take chances. Just don't blame me when you fall off.

xiphon

  • Immortal
  • God Like!
  • ******
  • Posts: 2754
  • A southerner oop north!
Re: Website issues - 10th July 2014
« Reply #8 on: Dec 23, 2014, 16:16 »
AWS BeanStalk?

If you think it's no more reliable, you're clearly doing it wrong, ha ha.

I run the infra behind a site with about 4-6 million unique visitors per month - served up from only 2 AWS instances (using a handful of other AWS products too, like RDS).  No global CDN needed, just an ELB in front of them.

With the low forum traffic these days, you probably could serve SDH off a shared platform ;)
I love citrus fruit!

[Al]

  • Server Monkey
  • Site Admin
  • God Like!
  • *****
  • Posts: 12304
  • Tweaking required.
    • Southerndownhill.com
Re: Website issues - 10th July 2014
« Reply #9 on: Dec 24, 2014, 07:49 »
I use AWS for some clients (auto scaling chef deployed stuff via OpsWorks) . It's fails every now and then too. It's costs more than the server we have does and the new box costs even less. It's really not worth it.
Ride fast, take chances. Just don't blame me when you fall off.


© 2019 Ride It Out