Thursday 11 February 2010

Running pveperf from Proxmox on any distribution

So in the quest to solve the mystery of why my 4x300GB 15k SAS disks are behaving like a dog I turned up a number of interesting resources. One of which was possibly a dodgy kernel driver.

So, off I go and install CentOS (aka poor man's RedHat) - if they don't support it properly then no-one will and I naturally typed 'pveperf'. pveperf is a lightweight performance profile tool that comes with Proxmox. Unfortunately it only comes with proxmox.

Unfortunately, it is the quickest way I know of how to find out the number of FSYNCS a second the drives will do, and that number is pretty critical for lots of virtual machines that wil be IO bound.

Anyway, turns out, pveperf is written in perl, so getting it working on any other machine is pretty trivial. Instructions here.

Wednesday 10 February 2010

Virtualisation - it rocks!/what a nightmare!

I love virtualisation. I hate virtualisation. Arrgggh!!!
I administer our development infrastructure which consists of the usual suspects - ldap server, file server, wiki, jira, source code repo, continuous integration servers etc. Being a small outfit I decided we should do the virtualisation dance. I am no stranger to virtualisation having used it before (on my laptop and in production).
So, my first foray into 'grown up' virtualisation involved using KVM on top of Ubuntu. This worked really well and it is amazing what you can pack into a single Xeon quad core with 8 gigs of RAM :). One minor niggle is that it was roll your own for pretty much everything, including backups. Taking snapshots of the virtual machines using LVM was a doddle, but then mounting those partitions and then tarring up those partitions etc. was a bit of a pain. A few bash scripts later and everything was fine, but it just wasn't reliable enough.
For technical (financial) reasons we had to move to a different hosting company and I noticed they offered Proxmox, a debian based distribution which includes a really nice web app for administering both KVM and OpenVZ containers. It rocks, it really does. Setting up virtual machines was a snitch - backing up was as complicated as ticking a tick box - woot.
After 4 or 5 months of bliss I noticed the CI server was running quite slowly and it turned out that it was Disk IO bound (as you would expect with what is essentially a DB server). So, on investigation I decided to find some nice SAS 15k 300GB disks to play with.
Anyway, new month, new hosting provider and I thought I would take a look at the big boys - XenServer and ESXi. They both insist on a Windows client which was a blessing and a curse - I don't like Windows, but the other guy I have pegged as a substitute SysAdmin really does. Installation on both went fine and I was very impressed. You could feel the big corporate brains behind it. Both are free but offer commercial add-ons. For ESXi you need to pay silly money if you want to manage more than one host, XenServer will let you manage a cluster for free.
So, after installing XenServer using DRAC (this is a Dell r410) I run my hands with glee!
Anyway, first thing I wanted to do was setup the machine that caused the most issues - the database seeder. It really just reads a row, does something in Java, then writes it out to a different database.
So, I installed a Windows 7 VM - both offer PV drivers (for efficiency) for Windows 7. Installation was a breeze, those SAS disks really are a thing of beauty.
Then I installed the updates (gotta love Windows), then SQLServer 2005 express and so on. There are 4 SAS disks in there, two on RAID 1 and two on non-raid. I configured the source OS on RAID, the source DB on one of the non-raid disks and the target DB on the other non-raid disk. Wow, this was going to move!
Finally I had the code base on there and clicked the button to seed. I was holding my breath to watch this thing fly! As a reference, I have Leopard running on a 2007 MacBook Pro. I use VMWare Fusion to run Windows XP with 1.5GB ram. Compiling the application on the laptop VM tales about 400 seconds. Compiling on this beast takes less than a 100. Woot. This is going to be *great*!
So I click the button to start the seeder - the first part scans the entire source database to make sure it is sane. Wow - that absolutely flew by. Amazing. Then it started to write to the new database. Every so many seconds it prints out how many rows it has written. It prints 300, 650, 900, 1400. 'Hmm', I think, 'I am sure the VM on the laptop is faster than this!' so I check - and yep, indeed - the laptop VM prints out '1300, 1500, 3740, 6010' (the second number is an anomaly, but hey). Eh? What?!?
So I start the entire process again, and yep, same behaviour. In the end I run both seeders at the same time (within seconds whilst I move the mouse). The VM on the new machine absolutely flew by through the sanity check and started writing minutes before the laptop VM. However, once the laptop started writing it soon caught up and left the new VM in its dust!
After pulling my hair out many many times and reading *the whole web* I decided to try Windows 2008 as that supposedly understand how to be a virtual citizen slightly better than Windows7. Same thing. Admittedly the performance of Windows2008 was better, almost 75% faster writing than Windows 7, but still no where near my silly little laptop.
Hmmm I think. I try a windows 7 install without installing the xen-tools (which installs drivers for paravirtualisation) and unbelievably the performance without the go-faster stripes is faster than with! Only about 20%, and still nowhere near the laptop performance, but this is insane. Some more googling beings up this and this (my post).
Bugger this, I think to myself. Fire up the remote console using DRAC and install ESXi. Installation was a dream. Install the vSphere client on my now rather infamous VM on my laptop (sigh).
Install Windows 7.... and blow me if it isn't exactly the same behaviour. Same speedy performance on reading, terrible performance on writing to a database. At least installing the PV drivers didn't slow it down, but it didn't help either.
On both XenServer and ESXi I tried a simple copy of a 600MB CD image which took about 15 seconds (in the VM) - on my lappy it takes many many minutes. The SAS disks *are* working for large sequential reads and writes, but lots of small writes (as evidenced by the database inserts) seems to bring it to its knees. I posted on the ESXi forums, but no help there yet.
Unfortunately, Debian, and therefore Proxmox does not work on this machine because they get confused about the order of the devices - they install fine but after reboot the disks cannot be found. Nice. (forum post).
So, what now? I am not sure. I get that virtualisation has a performance cost, but please don't expect me to believe that a VM on my bog standard laptop, with their infamously slow disks will perform better than a commercial class virtualisation tool. In fact I know the new disks work - copying the CD takes hardly any time at all (either on the host or in the VM).
Three days with hardly any sleep, 15 installations of Windows VMs, 4 installations of virtualisation products, 3 failed Linux installs (Debian, Proxmox and CentOS) later I am thinking 'I don't really care anymore'.
Next step will be to speak to the hosting providers - they have been really really helpful, but I can't expect them to support third party software - and ask them to take out the RAID card giving me 4 SAS disks, see if that helps.
Sigh.