Monday, 4 January 2010

Glad my backup was working only it isn’t

So I have re-installed proxmox, updated to the latest done the LVM partitioning dance and downloading the 50GB of backup data.

Our backup strategy involves:

  • take a (compressed) snapshot of each virtual machine
  • use rdiff-backup to capture deltas
  • encrypt the rdiff-backup ‘database’
  • copy the encrypted files to two different machines

So I checked the timestamp at the local backup and great the timestamps are for last nights back up.

Once the backups are on the production machine I restore them and start the virtual machines.  Magic.

Only not – the data for some reason stopped on the 23rd of December.  Hmm – strange.  Check the backup logs/emails – yep fine.  Check the timestamps – yep – fine.

Hmm – with a sinking feeling I start to realise that even though *something* was being backed up, it was the production data, not since December the 22nd anyway.

Thinking it through I had that terrible ‘doh!’ moment when I realised that fairly small, innocuous little item on my todo list is actually quite important….  We use encFS to backup a filesystem.  This works by mounting an encrypted directory into another file system.  In real world terms this means there is a directory which is an encrypted mirror of another directory.  Create a new file in the plain directory and as if by magic a new encrypted file will appear in the encrypted directory. 

The way that we encrypt the rdiff-backup database is by rsyncing it into the plain directory.

Guess what that last little todo was?  To mount the encrypted filesystem after system reboots.  Reboots like the one that happened on the 23rd of December.

So all the little pieces were happening – the only problem was that because the encrypted file system wasn’t mounted everything appeared to work except the encrypted file system was never updated. 

Doh!

Luckily, we use git for our source code management which means the last developer to work on the code base would (as per best practice) updated their git repo.  This means that developer simply needs to pull and then push and the source code server is up to date.

If only the wiki etc. was that simple :(

The one silver lining is that because this happened over Christmas we didn’t actually lose anything but we did find a critical problem in our (ok, my) backup strategy.

No comments:

Post a Comment