Thursday, 4 August 2011

Transparent Page Sharing - efficiency of disabling Large Pages


Finally I could find some free time to prove how efficient Transparent Page Sharing is. To do so I had to disable Large Page support on all 10 hosts of our vSphere 4.1.

Let me briefly describe our virtual server farm. We have 10 ESXi hosts with 2 Xeon 5650 CPUs in each host. 6 hosts are equiped with 96 Gb and 4 hosts - with 48 Gb, which gives us 768 Gb in total. Since all CPUs use Hardware Assisted MMU all ESXi hosts aggressively back virtual machines' RAM with Large Pages. Therefore, TPS won't kick in unless there is a memory contention on a host.

We run 165 virtual machines that have 493 Gb of assigned vRAM. According to Figure1 during the last week we have 488 Gb of consumed RAM and 33 Gb of shared RAM. When I just started my study on TPS I was very surprised to see some shared memory even though Large Pages were enabled. However, now I know that these 33 Gb are just zero pages that are shared when VM is powered on, without even being allocated in physical RAM of ESXi host.



In Figure 2 we can see that CPU usage is very low accros all cluster for the last week. According to VMware enabling Large Pages can be pretty effective with regard to CPU usage. VMware documents show about 25% lower CPU usage with LP enabled. Considering that we have had about 3% only of CPU usage I didn't see a problem to increase it to get benefits of TPS.



It took me a while to disable LP support on all hosts and then switch on and off maintenance mode on each host one by one to start taking advantage of TPS. In Figure 3 you can see memory stats for the period of time when I was still moving VMs around. The value of shared memory is not final yet, but you can see how fast ESXi hosts scan and share memory for VMs.


Since we disabled large pages support we should get higher rate of TLB misses which in turn should lead to more CPU cycles spent for walking through two page tables:

  1. The VM's page table to find Virtual Address to Guest OS Physical Address mapping
  2. Nested Page Talbe to find Guest Physical Address to Host Physical Address mapping

However, if you look at Figure 4 you won't see CPU usage increase. The spikes you can see there are mostly caused by masssive vmotions from host to host.



Unfortunately, I couldn't do all 10 hosts at once during working hours as there are some virtual MS Failover Clusters running in our vSphere and I needed to failover some cluster resources before moving cluster node off the ESXi host, therefore the last host was done only 5 hours later. You can see that the last host was done around 7-20 pm in Figure 5. After 5 hours since change has been implemented we already had 214 Gb of shared memory. Consumed memory decreased from 488 Gb to 307 Gb, thus saving us 180 Gb already. As you can see in Figure 6 there is still no CPU usage penalty due to disabled Large Pages.



In a couple of screenshots below you can see stats 24 hours since the change was done.



I have seen some posts on VMware communities where people were asking about the CPU impact with Large Pages disabled. The conclusion of this post can be a good answer to all those questions -  switching off the Large Pages saved us 210 Gb and CPU usage still stays very low.
With vSphere 5 release the benefits of TPS became not significant, but I guess there will be a lot of companies staying with vSphere 4.1 for a while. Also new vRAM entitlements made TPS a bit more useful.

Nevertheless, it is difficult to generalize performance impact of disabling Large Pages.  I wouldn't recommend to do so without proper test as TPS effeciency can significantly vary due to your vSphere specifics.

Update: I have been using Veeam Monitor to collect all these stats in Cluster view, but today I also checked vCenter performance stats and discovered that there is a difference in Shared and Consumed values. However, if I check Resources stats (which includes all VMs) in Veeam Monitor I get the same stats as I can see in vCenter. According to vCenter the Shared amount of memory is even higher - 280 Gb against 250 Gb in Veeam Monitor. That basically means more than 50% of physical memory savings!


If you find this post useful please share it with any of the buttons below. 

3 comments:

  1. Great post. I have been testing this on a couple of my clusters and have seen dramatic memory savings while only a little CPU increase. Im considering doing this across all my clusters to reclaim more memory. Have you seen any long term, negative impacts?

    Thanks

    ReplyDelete
  2. To be honest I haven't noticed any drawbacks of disabling TPS on two clusters so far.

    ReplyDelete
  3. I have it disabled on a 14 node cluster now with about 300VMs. Its been about 4 days now with no issues and about 30% savings in memory. Ill let you know if there are any issues. Thanks again for the response.
    Joe

    ReplyDelete