Wednesday, 23 March 2011

Suprise when profiling Java/Groovy app with jprofiler

First, if you are a Java programmer you really need to get hold of http://www.ej-technologies.com/products/jprofiler/overview.html!

Long story short, I profiled a long running application thinking I knew exactly where the performance penalty was - it wasn't anywhere near there :) It was in the equals() method of a groovy class which was implemented as:

boolean equals(Object other) {

other instanceof ThisClass && other.hashCode().equals(other.hashCode())

}

hashCode:

int hashCode() {

this.name ? name.hashCode() : 0

}

(This class had a single property which was a String and set in the constructor so I could easily cache the hashCode) but flip me, who knew this would be a bottleneck accounting for 17% of the CPU time. OK, it was called over 3 million times, but still....

The solution? I don't know if it is the cost of groovy or what, but my first call is to cache the hashCode and reference that in the hashCode and equals method. Run the profiler and see if that helped. Second step will be to re-implement in Java and see if that helps....

Long story

There was a Java/Groovy based app which used Hibernate and SQL Server which I migrated to use MongoDB. The app had some non-trivial complex hierarchies which were a pain to store in a relational DB but trivial to store in a document DB. There was an obvious aggregate root for the data structure so the conversion was very straight forward.

The application was fantastically quick for loading a single document but the user's home page displays all their active documents, of which there might be hundreds. The SQL server application uses a denormalised summary table which condenses each document into a single row so that is lightening quick but the mongo app was really slow.

Given that hibernate has a first level cache *and* given that the documents have lots of references *and* given that MongoDB has no such first level cache you might follow my line of thinking that the performance cost was in Mongo loading each referenced object over and over again.

Like me, you would be wrong :)

Lesson of the day - use jprofiler - it just might surprise you!

Monday, 21 March 2011

VMware - moving hosts between data centers

I wanted to move vcenter to a virtual machine running on one of the ESXi hosts that it was managing. Chicken and egg huh :) but this is actually a supported configuration.

So I removed all the hosts, shutdown vCenter, created the new virtual machine, installed vCenter and tried to add the first host and received a very cryptic 'You do not hold privilege "System > View" on folder ""'.

Hmmm....

Turns out it because I chose to turn on "Lockdown" mode on all the hosts which restricts who can administer them. Logging in via iLO to the host itself and disabling lockdown in the VMware console does the trick.

Thursday, 17 March 2011

Best filesystem for virtual machine running SQL Server (part 2)

This is the second part - please read part 1.

I don't care about synthetic tests like bonnie++ or dd if=/dev/zero etc. I am only interested in finding out which configuration runs my test the fastest.

However, it is nice to see the difference :) so these are the results of copying an 18GB file into each of the three configurations:

(recall, first config is XFS partition on single disk, second is RAID0 XFS and third is RAID1 XFS).

(Copying from the non-raid partition into a non-raid partition on the second disk)

time `dd if=test of=/data/singleb/test bs=1M ; sync`

18267+1 records in

18267+1 records out

19155058688 bytes (19 GB) copied, 164.756 s, 116 MB/s

real 2m52.385s

user 0m0.030s

sys 0m26.490s

(Copying from the non-raid partition into the RAID0 partition)

time `dd if=test of=/data/raid0/test bs=1M ; sync`

18267+1 records in

18267+1 records out

19155058688 bytes (19 GB) copied, 266.782 s, 71.8 MB/s

real 4m28.422s

user 0m0.010s

sys 0m27.840s

(Copying from the non-raid partition into the RAID0 partition)

time `dd if=test of=/data/raid1/test bs=1M ; sync`

18267+1 records in

18267+1 records out

19155058688 bytes (19 GB) copied, 400.644 s, 47.8 MB/s

real 6m47.905s

user 0m0.050s

sys 0m26.620s

Copying from one disk to another is the fastest, followed by RAID0 then RAID1. You might have expected RAID0 to be the fastest but recall that the 18GB file is being copied from a parition on the same disk so one of the disks in RAID0 (and RAID1) will be involved in reading as well as writing, hence the slow down.

OK, since I opened this door, maybe a fairer (but still meaningless :)) test would be to use dd if=/dev/zero so the disks are purely available for writing..... Results of that silly test (time `dd if=/dev/zero of=18G bs=1M count=18000; sync`) are:

single disk:

18000+0 records in

18000+0 records out

18874368000 bytes (19 GB) copied, 156.651 s, 120 MB/s

real 2m43.003s

user 0m0.040s

sys 0m16.760s

raid0:

18000+0 records in

18000+0 records out

18874368000 bytes (19 GB) copied, 82.0213 s, 230 MB/s

real 1m25.195s

user 0m0.030s

sys 0m16.800s

raid1:

18000+0 records in

18000+0 records out

18874368000 bytes (19 GB) copied, 189 s, 99.9 MB/s

real 3m16.593s

user 0m0.040s

sys 0m17.110s

All of this is meaningless really - the copy from one disk to another is probably the closest to the performance you will get in real life. Whilst RAID0 flies for a long sequential write it slows down significantly when it has to be read from as well - i.e. real life.

Based on this, I don't know whether it will be more performant to have the OS and one DB on disk1 and the second DB on disk2 or just both on RAID0... We shall see! :)

DISCLAIMER - RAID0 means losing all your data stored on that partition if *any* disk dies.

Best filesystem for virtual machine running SQL Server

(part 1)

I need to run SQL Server in a VM on top of Linux (Ubuntu 10.10) using VirtualBox ('cause getting VMware Player running is just not worth it!).

I have two 640GB WS6402AAEX disks (fairly quick) and the OS is installed in a (software) 64 GB RAID1 partition.

In fact, to be clear:

- /dev/sda1 is 16GB SWAP

- /dev/sda2 is 64GB RAID1 (software)

- /dev/sdb1 is 16GB SWAP

- /dev/sdb2 is 64GB RAID1 (software)

The base machine is a quad core CPU i5 760 (2.80 GHz) with 16GB RAM (purchased from those great people at http://pcspecialist.co.uk/.

The question is, which is the best setup for virtualising SQL Server? My plan is to install a single Server 2008 64Bit VM with 12GB RAM (leaving 4 for the Ubuntu 10.10 host). That VM will have 48GB for the OS and 64GB for SQL Server data files. It will also have IntelliJ 8 (don't ask) configured to run a development job which sucks data from one database and sticks it in another database. The same configuration (using Windows 7) takes about an hour to run when installed directly onto the hardware.

I plan to test the following scenarios:

- guest OS on /dev/sda3 (XFS), data on /dev/sdb3 (XFS) i.e. no RAID

- guest OS and data on RAID 0 (XFS)

- guest OS and data on RAID 1 (XFS)

*remember* - the host OS is running on a RAID1 partition on the same disks.

part 2 will show some meaningless synthetic tests and part 3 will show the results of the real test.

XFS or EXT4 for OS desktop?

Which file system to use for the OS? Quite simply, don't use XFS - use EXT (or whatever). Installing a minimal ubuntu server took absolute ages (hours) over XFS on RAID0 and RAID1.

Think about it - XFS *rocks* at large files but its weak spot is tiny files - what does an installation involve? Lots of small files.

Changing it to EXT4 and re-installing reduces it from hours to mere minutes (20 or so - I didn't time).

I will be using XFS for storing the virtual machines one though - for sure. Just not the OS.

Back to Gnome

(this isn't very interesting, but it is a test of a blogging client)

I find Linux really usable. I do a lot of work in a command line over ssh on lot of machines.

I also use XenServer (which requires windows)

I also do development with other developers using Git that *has* to have windows command lines.

So, up till now I have used Windows and putty, but it has always felt a bit lame - putty rocks, but it isn't the same as having a rich command line from which to launch ssh.

Cygwin didn't cut it for me because it runs inside the lame, crippled and just terrible terrible windows command box (whatever it is called).

Finally, I now have a box worth something (16GB RAM, two fast hard disks) which means I can run Windows inside a VM on top of Linux.

Yeah.

So, which distro to use? I really liked the look of kde 4.6, but I am a debian dude and I will never go near kubuntu. Ever. OpenSuSE seems to be the on-to-goto for KDE, so sure, let's give it a go.

Wow! This rocks - super speedy, YaST is infinitely better than it was 5 years ago. Everything just works.

Until it doesn't. After a few hours/days I noticed a few things just stopped working - the KDE panel at the bottom stretched beyond the edge of the monitor (I have a large and small monitor and the small one is the primary - I think it got a bit confused). The network icon was the last to go - sure, the network still worked but I don't want to see a red X. OK - easy enough to remove. Then I tried to sort out using my plugin-headset as the primary input and output. Phew - after trying to set it in three different places (!) I managed to get it to play sounds, but skype still didn't want to take the mic. Eventually, it worked, but don't ask me how.

I am looking for a desktop that *just works*, and the only one I know off that does that is debian. Debian squeeze has just been released so it isn't too stale (KDE 4.4.5 + upstream patches), so off to download that.

Wow - it flies! Install the NVIDIA driver - and lock up. Total lock up. I haven't got time for this.

Finally - Ubuntu - OK, means Gnome, but I am not so sure that is a bad thing after my recent play with KDE.

Wow - everything just works. I mean *everything*. It prompted me that I needed some "naughty" firmware, click the relevant button and off it goes. Ubuntu Software Center is pretty neat as well.

So, in conclusion - I am shocked. Ubuntu - you are my saviour - who would have thought.