High Co-Stop Stats and Snapshots?!

So was running into an interesting issue this past week that I thought I would write up a post about.

I randomly and for no reason(seemingly at least) started seeing high cpu ready times at random times which was greatly affecting performance for one of my environments. This particular environment had a lot of VMs that were rocking 2(and a few with more) CPUs. So being the good admin I am I started reducing VMs to just one CPU.

BUT, it didn’t fix the issue. Which I was kind of worried about anyway since nothing had really “changed” in my environment so I couldn’t figure out why there would be suddenly High CPU Ready Times.

So I started digging looking at all the other stats I could to try and find something else that made sense, then I came across CPU Co-Stop.

Co-Stop is defined in vCenter as “Time the VM is ready to run, but is unable to due to co-scheduling constraints.” Huh! That sounds an awful lot like cpu ready, except for multiple vCPUs. Alright lets look at a Co-Stop graph in vCenter…

costoptime1

costoptime2

This is stacked graph of the VMs and I know it’s hard to read, but some of the biggest offenders had a maximum CPU Co-Stop time of 638521 ms. For those playing at home, that is 638 seconds!!!!!! Ridiculous!!!

The other thing to notice about the graphs is that they are from two DIFFERENT hosts, over the same amount of time. Looks eerily similar huh?!?

By this point my curiousity was WAY PEAKED. I mean for two hosts to be having CPU issues at the exact same times…something is seriously going on here.

Then I found this VMware article which says snapshots can be the culprit for Co-Stop Times:

Snapshots introduce complexity to storage I/O. Due to the nature of snapshots, every read operation must traverse every snapshot disk and then the base disk in order to verify the appropriate disk block to return. Because these extended read operations are required, snapshots are the most performance-intensive disk format for virtual disks (as opposed to thin-provisioned, thick-provisioned, or eager-zeroed thick-provisioned virtual disks).

It makes a lot of sense, but is something I haven’t really thought of before…..wait, when was the last time I Snapshotspacechecked on my snapshots. Let’s look at the Storage View….

Oh Nice…7 Snapshots over 10GBs in size and one with 36GB!!!! Oh dear… must have been the upgrade from Server2008 to Server 2008 R2.

This obviously needs to be remedied.

As it turns out, after I removed and consolidated some VM Snapshots my hosts are much happier than they had been previously, and I am no longer see the Co-Stop issues!

Reference:

High co-stop (%CSTP) values seen during virtual machine snapshot activities

One Response to High Co-Stop Stats and Snapshots?!

Leave a Reply