Virtual Nomad: Why I prefer to disable vSphere HA Admission Control

According to vSphere Documentation "vCenter Server uses admission control to ensure that sufficient resources are available in a cluster to provide failover protection and to ensure that virtual machine resource reservations are respected."

Admission Control guarantees enough capacity is available for virtual machine failover and it works at 3 different levels - Host, Resource Pool, Cluster. Only the Cluster level Admission Control is manageable and that's what I want to talk about.

Even though the idea behind Admission Control is pretty simple this mechanism is a complex topic and according to the best book about vSphere HA by Duncan Epping and Frank Denneman "Admission Control is more than likely the most misunderstood concept.. and because of this it is often disabled".

To achieve its goal the admission control can use the following actions:

Restrict Powering on a virtual machine.
Disallow Migrating a virtual machine onto a host or into a cluster or resource pool.
Restrict Increasing the CPU or memory reservation of a virtual machine.

In simple words vCenter uses Admission Control to have enough resources to power on VMs in case one or more of the hosts fail, but not to guarantee VMs have the same set of resources.

To achieve this goal Admission Control can use 3 different mechanisms. Let's discuss what diffuculties vSphere admin may have with each of them.

Host Failures Cluster Tolerates

Slot size will be the main factor defining when vCenter won't let you power on another VM. This admission control type gets skewed very easily by having one large VM with all RAM reserved.

Yes, you can define custom slot size, but deciding on the right size of the custom slot isn't a trivial task. If you set it too low you can't be sure you have enough reserved resources for host failure. If you set it too high you will end up with fewer number of slots you can use for your VMs. So you can end up with cluster with plenty of available resources, but won't be able to deploy new VM. Also, most of the vSphere environments constantly grow and change. therefore, slot size have to adjusted accordingly.

One can calculate average VM size and use these values for custom slot size. However, while vCenter will reserve enough slots to tolerate failure of configured number of hosts it won't guarantee that either of your host will have enough resources to power on your largest VM.

HA Cluster is smart enough to ask DRS to shuffle VMs around to make space for your large business critical server. But, there is still no guarantee of successful restart of this VM. Moreover, we all know that DRS is part of vCenter. So if vCenter was located on the failed ESXi host there will be no DRS to take care of resource defragmentation in the cluster.

Percentage of Cluster Resources Reserved

This Admission Control seems to be smarter and more flexible than the first one. vCenter doesn't use fixed slot size any more. This is what most vSphere admins recommend to use, but nobody stresses out that in order to have this policy work you need to set CPU/RAM reservation per every single VM in the cluster.

So here is what vCenter does:

Calculate how much resources it has in the cluster
Calculate total reserved virtual machines resources
Calculate available resources by distracting the reserved resources from available resources.
Ensure that there is more available resources than percentage of reserved resources for failover (that's the percentage you configure)

As you can see it is not based on assigned resources to VM. This equation works with RESERVED resources only.

So if you need to end up with properly working admission control you need to assign reservation on each of the VMs. Otherwise, you may end up with situation like in the following screenshot where we you can see that the cluster runs almost 300 VMs, but Admission Control thinks you have 98% of Cluster Failover Capacity.

This is very common situation where people don't use reservations at all or use Resource Pool reservations (which are not taken into consideration by Admission Control). In this case Admission Control uses default 32MHz and Memory overhead in its calculation and your cluster will run out of resources way before admission control kicks in and prohibits powering on new VMs.

Well, you can say - let's use VM reservations then to make admission control work properly, but I have to disagree for the following reasons:

Reserverd RAM can't be re-allocated to other VMs even if it's not used at all. That's valid for Windows VMs as they zeroize all RAM pages during the boot. Linux VMs don't touch RAM pages until they really want to use them, thus, these 'untouched' memory pages can be still re-allocated to other VMs.
It also leads to higher administrative and operational overhead. Imagine micro-managing correct reservations for thousand of VMs and then keeping these reservations up to date.
VMware Performance Best Practices recommend using Resource Pool Reservations instead of VM level reservations

On top of that you have to remember to change percentage every time you change number of hosts. And if you have unbalanced cluster you need to have different percentage reservation for CPU and RAM.

This approach doesn't solve the problem of resource defragmentation either. DRS still needs to kick in if the host doesn't have enough resources to restart VM. And again, if vCenter is down there is no DRS. Which means there will be no defragmentation and some large critical VMs won't power on.

Dedicated Failover host

This is the least preferred option according to multiple blogs and books on HA Admission Control, but I strongly disagree with such opinion.

First of all it is very simple and doesn't require a lot of planning. it is pretty much 'You get what you see'

The main reason why people recommend against using the dedicated failover hosts is that these hosts are not utilised until other hosts fail. Although that's actually applicable to other Admission Control mechanisms as you still have to reserve resources for failover - which means you keep those reserved GHz and GB unused.

This approach doesn't have issues with resource defragmentation and doesn't need DRS. So if vCenter fails with failed host you still get your large VMs restarted.

This Admission Control type comes with some drawbacks:

When you have hosts in this hot-standby mode the ratio of vCPU/pCPU gets worse as less physical CPUs participate in servicing vCPUs, which can impact CPU Ready time.
Another problem could be if this server is a part of VSAN cluster. That would be too much waste of resources.

As you can see none of the Admission Controls is ideal. None of them fit all sizes. And each mechanism requires thorough planning, testing and regular revisions.

On the contrary to what Duncan and Frank said I prefer to disable HA Admission Control because I understand how it works . Instead, I think it is sufficient to have a good monitoring system in place to control vSphere utilisation level and see resource usage trends so that you can do capacity planning in advance. vRealise Operations suite is a good example of such monitoring system.

I am not trying to say it is useless in all situations, but there were very few companies where I saw Admission Control correctly configured and looked after properly, and all of these few companies used Dedicated Failover host.

And the bigger problem was that people blindly relied on HA Admission Control even though it doesn't work correctly without per VM reservation or without right-sizing the slot size.

Tuesday, 12 April 2016

Why I prefer to disable vSphere HA Admission Control

Host Failures Cluster Tolerates

Percentage of Cluster Resources Reserved

Dedicated Failover host

No comments:

Post a Comment